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Abstract 

We say that a first order formula distinguishes a graph G from an- 
other graph G' if is true on G and false on G' . Provided G and G' are 
non-isomorphic, let D(G, G') denote the minimal quantifier rank of a such 
formula. Let n denote the order of G. We prove that, if G' has the same or- 
der, then D(G, G') < (ra + 3)/2. This bound is tight up to an additive constant 
of 1. Furthermore, we prove that non-isomorphic G and G' of order n are dis- 
tinguishable by an existential formula of quantifier rank at most (n + 5)/2. As 
a consequence of the first result, we obtain an upper bound of (n + l)/2 for the 
optimum dimension of the Weisfeiler-Lehman graph canonization algorithm, 
whose worst case value is known to be linear in n. 

We say that a first order formula $ defines a graph G if distinguishes 
G from every non-isomorphic graph G' . Let D(G) be the minimal quantifier 
rank of a formula defining G. As it is well known, D(G) < n + 1 and this 
bound is generally best possible. Nevertheless, we here show that there is a 
class C of graphs of simple, easily recognizable structure such that 

• D(Cr) < (n + 5)/2 with the exception of all graphs in C; 

• if G G C, then it is easy to compute the exact value of D(G). 

Moreover, the defining formulas in this result have only one quantifier al- 
ternation. The bound for D(G) can be improved for graphs with bounded 
vertex degrees: For each d > 2 there is a constant Cd < 1/2 such that 
D(G) < + 0(d 2 ) for any graph G with no isolated vertices and edges 
whose maximum degree is d. 

Finally, we extend our results over directed graphs, more generally, over 
arbitrary structures with maximum relation arity 2, and over /c-uniform hy- 
pergraphs. 
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1 Introduction 



From the logical point of view, a graph G is a structure with a single anti-reflexive 
and symmetric binary predicate E for the adjacency relation of G. Every closed first 
order formula $ with predicate symbols E and = is either true or false on G. Given 
two non- isomorphic graphs G and G', we say that <£> distinguishes G from G' if $ 
is true on G but false on G '. The quantifier rank of $ is the maximum number of 
nested quantifiers in this formula (see Section 2 for formal definitions). Let D(G, G') 
denote the minimum quantifier rank of a formula distinguishing G from G'. The 
number D(G, G') is symmetric with respect to G and G' because, if $ distinguishes 
G from G', then -1$ distinguishes G' from G and has the same quantifier rank. 

The order of a graph is the number of its vertices. Throughout the paper the 
letter n will denote the order of a graph G. 

The first question we address is how large D(G, G') can be over non- isomorphic 
graphs G and G' of the same order n. For any n it is not hard to find a such pair 
with 

D(G, G') > (ra + l)/2 
(see Example 2.13). On the other hand, there is an obvious general upper bound 

D(G,G') < n. 

Indeed, a graph G with vertex set V(G) = {1, . . . ,n} and edge set E(G) is distin- 
guished from any non-isomorphic G' of order n by the formula 

3x! . . . 3x n /\ = Xj) A /\ E(xi, Xj) A /\ ->E(xt, Xj) . (1) 

Wi {M}6£(G) {m}^(G) / 

It seems that no better upper bound has been reported in the literature so far. Here 
we prove a nearly best possible bound. 

Theorem 1.1 If G and G' are non-isomorphic graphs both of order n, then 

D(G,G') < (n + 3)/2. 

It is worth noting that the distinguishing formulas resulting from our proof of 
Theorem 1.1 have a rather restricted logical structure. We say that a first order 
formula $ is in the negation normal form if the connective -> occurs in $ only in 
front of atomic subformulas. If $ is such a formula, its alternation number is the 
maximum number of alternations of 3 and V in a sequence of nested quantifiers of $. 
The proof of Theorem 1.1 produces distinguishing formulas in the negation normal 
form whose alternation number is at most 1. We are able to prove Theorem 1.1 even 
with alternation number but with a little weaker bound D(G, G') < (n + 5)/2. The 
proof actually produces either an existential or a universal distinguishing formula. 
An existential or universal formula is a formula in the negation normal form whose 
all quantifiers are of only one sort, existential or universal respectively. 



3 



Our proof of these results is based on the well-known combinatorial characteri- 
zation of D(G, G') in terms of the Ehrenfeucht game on G and G' [10] (Fraisse [11] 
suggested an essentially equivalent characterization in terms of partial isomorphisms 
between G and G"). In this setting, D(G, G') is equal to the length of the game un- 
der the condition that the players play optimally. Thus Theorem 1.1 says that the 
Ehrenfeucht game on non-isomorphic graphs of the same order n can be won within 
at most (n + 3)/2 rounds. 

Theorem 1.1 has consequences for the complexity analysis of the Weisfeiler- 
Lehman algorithm for graph isomorphism testing. This algorithm has been studied 
since the seventies (see e.g. [2, 8]). An important combinatorial parameter of the 
algorithm, occurring in the known bounds on the running time, is its dimension. De- 
note the optimum dimension for input graphs G and G' by WL(G, G') (see Section 
7 for a detailed exposition). 

Cai, Fiirer, and Immerman [8] come up with a remarkable construction of non- 
isomorphic G and G' of the same order for which WL(G, G') = Q(n). Though they 
do not specify the constant hidden in the f2-notation, a simple analysis of their 
proof, that we make in Section 7.5, gives at least 

WL(G,G") > 0.00465 n. (2) 

In the same paper, the authors give a logical characterization of WL(G, G') which 
readily implies the relation 

WL(G, G') < D(G, G') - 1. (3) 

Thus, our Theorem 1.1 establishes an upper bound 

WL(G,G") < 0.5 n + 0.5. (4) 

We have to remark that this bound for the dimension does not imply any good 
bound for the worst case running time. In fact, the lower bound (2) shows that the 
Weisfeiler-Lehman algorithm hardly can be practical in the worst case. However, 
we believe that (4) shows an interesting combinatorial property of the algorithm 
previously never observed. 

Providing upper bounds for the length of the Ehrenfeucht game and for the di- 
mension of the Weisfeiler-Lehman algorithm, Theorem 1.1 is therefore rather mean- 
ingful from combinatorial and computer science point of view. Let us now focus on 
logical motivations in the scope of finite model theory. 

We say that a formula <3> identifies a graph G (up to an isomorphism in the class 
of graphs of the same order) if <3> distinguishes G from any other non-isomorphic 
graph G' of the same order. Let D(G) denote the minimum quantifier rank of a 
such formula. Note that, if G is distinguished from G' by formula $c, then G is 
identified by the conjunction f\ G , <&g' over ai l non-isomorphic G' of the same order. 
It easily follows that D(G) = maxD(G, G') over all such G' . In these terms Theorem 
1.1 reads D(G) < (n + 3)/2. 

However, from the point of view of finite model theory it is more natural to 
address defining rather than distinguishing formulas. A first order formula $ defines 
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a graph G if $ distinguishes G from all non-isomorphic graphs, regardless of their 
order. Let D(G) be the minimum quantifier rank of a formula defining G. As it is 
well known, 

D(G) <n+l 

for every G (we have to append Vx n+ i to the quantifier prefix of (1) and say that any 
vertex x n+ \ is equal to one of x±, . . . , x n ). This bound cannot be generally improved 
because, for example, no formula of quantifier rank n can distinguish between two 
complete graphs of orders n and n + 1. Nevertheless, our next aim is to suggest a 
bound for D(G) similar to Theorem 1.1 and explicitly describe all the exceptions. 
We are able to prove a dichotomy result: Either D(G) < n/2 + 0(1) or else G has 
a simple structure and, moreover, in the latter case D(G) is efficiently computable. 

Theorem 1.2 There is an efficiently recognizable class of graphs C such that 

• D(G) < (n + 5)/2 witi the exception of all graphs in C; 

• if G G C, then the exact value ofD(G) is efficiently computable. 

Moreover, every graph G admits a defining formula whose alternation number is 
1 and whose quantifier rank is as small as possible if G G C and does not exceed 
(n + 5)/2 ifG <£ C. 

Referring to the efficiency here, we mean the time 0(n 2 logn) on a random access 
machine with input graphs given by their adjacency matrices. T. Luczak [22] poses 
a question if D(C) is computable. It seems plausible that it is not. The simulation 
of undecidable problems by finite structures, as in the Trakhtenbrot theorem [28] 
(cf. [29] and [26, theorem 8.2.1]), can be considered an evidence in favor of this 
hypothesis. In this respect, Theorem 1.2 provides us with a non-trivial computable 
upper bound for D(G). 

The bound of Theorem 1.2 can be improved for graphs with bounded vertex 
degree: For each d > 2 there is a constant Cd < 1/2 such that D(G) < c d n + 0(d 2 ) 
for any graph G with no isolated vertices and edges whose maximum degree is d. 
We do not try to find the best possible q being content with a constant strictly 
less than 1/2. Note that no sublinear bound is possible here. It is easy to show, 
for example, that D(G n ) > n/(d + 1) for n = m(d + 1) and G n being the vertex 
disjoint union of m copies of the complete graph on d + 1 vertices. Moreover, no 
sublinear upper bound is possible even for connected graphs, as follows from the Cai- 
Furer-Immerman bounds (2) and (3) and because G and G 1 in their construction 
are connected graphs of bounded degree (see Section 7.5). 

With minor efforts, Theorems 1.1 and 1.2 carry over to directed graphs and, more 
generally, to arbitrary relational structures with maximum arity 2. In combinatorial 
terms, the latter are directed graphs endowed with colorings of the vertex set and 
the edge set. 

Finally, we prove somewhat weaker analogs of Theorems 1.1 and 1.2 for A;-uniform 
hypergraphs. The upper bound we obtain here is (1 — l/k)n + 2k — 1. It remains 
open if this bound is tight for k > 3 since the only lower bound we know for any k 
is (n+ l)/2. 
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Previous work 

The graph identification is studied in [16, 17, 8, 12, 13, 14] in aspects relevant to 
computer science. The main focus of this line of research is on the minimum number 
of variables used in an identifying formula, where formulas are in the first order 
language enriched by counting quantifiers. A counting quantifier 3- m x^ means 
that there are at least m vertices x for which the statement \1/ holds. Let L(G,G') 
denote the minimum number of variables in a formula distinguishing non-isomorphic 
graphs G and G' (different occurrences of the same variable are not counted). Let 
C(G, G') be the analog of this number for the logic with counting quantifiers. Since 
every formula of quantifier rank r can be rewritten in equivalent form using only r 
variables, we have 

C(G,G') < L(G, G') < D(G, G'). 

Asuming that G and G' are of the same order, our Theorem 1.1 establishes an 
upper bound of (n + 3)/2 for the whole hierarchy. The aforementioned lower bound 
by Cai, Furer, and Immerman [8] is actually C(G,G') = Q(n), for infinitely many 
non-isomorphic G and G' . Their characterization of the optimum dimension of the 
Weisfeiler-Lehman algorithm is WL(G, G') = C(G, G') - 1. 

Let L(G) (resp. C(G)) be the minimum number of variables in a formula (resp. 
with counting quantifiers) identifying a graph G. By the aforementioned relationship 
between distinguishing and identifying formulas, we have L(G) = maxL(G, G') over 
all G' of order n non-isomorphic with G. Similarly, C(G) = maxC(G, G') over 
G non-isomorphic with G, where G' may be of any order since graphs of different 
orders are easily distinguishable in the logic with counting quantifiers. Thus, C(G) 
is equal to the minimum number of variables in a formula with counting quantifiers 
that defines G. 

Define D(n), L(n), and C(n) to be the maximum possible values of D(G), L(G), 
and C(Gr), respectively, over graphs of order n. We now know that 

0.00465 n < C(n) < L(n) < D(n) < 0.5 n+ 1.5, 

where the upper bound is given by Theorem 1.1 and the lower bound is actually the 
Cai-Furer-Immerman bound (2). The values of L(n) and D(ra) are at most 1 apart 
from the upper bound. An interesting open question is where in this range C(n) is 
located. 

It is known that C(G) = 2 for almost all graphs of a given order, C(G) = 3 
for almost all regular graphs of a given order and degree, C(G) = 2 for all trees 
[4, 5, 21, 17], C(G) = 0(g) for graphs of genus g [12, 13], and C(G) < k + 2 for 
graphs of tree- width k [14]. For strongly regular graphs it holds C(G) = 0(y/n\ogn), 
where n denotes the order of G [3]. If G has separator of size 0(n s ), < 5 < 1, 
then C(G) = 0{n 5 ) [8]. This result applies, in particular, to classes of graphs with 
excluded minors, that have separators of size 0(y/n) [1]. 

Estimation of D(G, G') is an interesting research problem not only for graphs 
but also for any class of structures. The case of words is considered in [27] in the 
context of Zero-One Laws of first order logic, where D(W, W) is estimated for W 
and W being independent random binary words of length n. 
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Subsequent work 

In [30] it is shown that D(G) = O(logn) if G is a tree of bounded degree or a 
Hamiltonian outerplanar graph. This upper bound complements the popular lower 
bounds D(P n , P n +i) > log 2 n — 3 (e.g. [26, Theorem 2.1.3]) and D(C n ,C n+ i) > log 2 n 
(e.g. [9, Example 2.3.8]), where P n is the path and C n is the cycle on n vertices. 

As already mentioned, in the present paper we manage to extend Theorem 1.1 
from graphs to arbitrary structures with maximum relation arity 2 and to /c-uniform 
hypergraphs. Extension to arbitrary structures G and G' with maximum relation 
arity k seems a much more subtle problem. It is done in [25] with bound (1 — j^)n + 
k 2 — k + 2, even if we restrict the alternation number of a distinguishing formula to 1. 
The same bound is established for D(G) provided no transposition of two elements 
of G is an automorphism of the structure. 

Theorem 1.1 gives a worst case upper bound on D(G, G') for graphs of the 
same order. In [18] the average case is analyzed and it is proved that, for random 
independent G and G' on n vertices, with probability 1 — o(l) we have D(G,G') = 
log 2 n(l + o(l)). Moreover, D(G) for a random graph G is determined with high 
precision: With probability 1 — o(l), 

log 2 n-2 log 2 log 2 n < D(G) < log 2 n - log 2 log 2 n + log 2 log 2 log 2 n + 0(1). 

The upper bound here holds even if the alternation number of defining formulas 
is restricted to 1. A logarithmic upper bound is also proved with the alternation 
number restricted to 0. Together with Theorem 5.1 in the present paper, this shows 
that bounding the alternation number does not affect much the maximal and the 
average values of D(G, G') for graphs of the same order. 

In [24] the "best case" behavior of D(G) is investigated. Namely, let g(n) = 
minD(G) over graphs of order n. It is not hard to see that g(n) — > oo as n — > oo 
but it is not so clear how fast or slowly g(n) grows. In [24] it is proved that g(n) 
can be so small if compared to n that the gap between the two numbers cannot 
be estimated by any computable function, i.e., there is no recursive function / 
such that n < f(g(n)) for all n. However, if we "regularize" g{n) by considering 
g(n) = max m < ra g(m), we have g(n) = (1 + o(l)) log*n, where log*n is the smallest 
number of iterations of the logarithm base 2 that suffices to decrease n below 1. This 
result is proved even under the restriction of the alternation number of a defining 
formula to a constant. Under the strongest restriction of the alternation number to 
0, an infinite family of graphs with D(G) < 2 log* n + 0(1) is constructed. 

Organization of the paper 

The paper is organized as follows. Section 2 contains the relevant definitions from 
graph theory and logic as well as the basic facts on the Ehrenfeucht games. In 
Sections 3 and 4 we prove Theorems 1.1 and 1.2, restated there as Theorems 3.15 
and 4.6. In Section 5 we prove a variant of Theorem 1.1 for distinguishing formulas 
whose alternation number is restricted to 0. The case of graphs with bounded vertex 
degree is considered in Section 6. Section 7 gives an exposition of the Weisfeiler- 
Lehman algorithm and applies Theorem 1.1 for its analysis. This section is mostly 
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expository and, in particular, includes computation of an explicit constant in the 
Cai-Furer-Immerman bound (2). Section 8 discusses the extensions of Theorems 1.1 
and 1.2 to directed graphs and relational structures of maximum arity 2. Section 9 
is devoted to £>uniform hypergraphs. Section 10 lists open problems. 

2 Preliminaries 

2.1 Structures 

2.1.1 Graphs 

Given a graph G, we denote its vertex set by V(G) and its edge set by E(G). The 
order of G will be sometimes denoted by |G|, that is, \G\ = \V(G)\. The neighborhood 
of a vertex v consists of all vertices adjacent to v and is denoted by T(v ). 

The complement of G, denoted by G, is the graph on the same vertex set V(G) 
with all those edges that are not in E(G). Given G and G' with disjoint vertex 
sets, we define the sum (or disjoint union) G U G' to be the graph with vertex set 
V(G) U V(G') and edge set E(G) U E(G'). 

A set S C V(G) is called independent (or stable) if it contains no pair of adjacent 
vertices. S is a clique if all vertices in S are pairwise adjacent. The independence 
(or stability) number of G, denoted by a(G), is the largest number of vertices in 
an independent set of G. The clique number of G, denoted by u(G), is the biggest 
number of vertices in a clique of G. The complete graph of order n, denoted by K n , 
is a graph of order n whose vertex set is a clique. The complement of K n is the 
empty graph of order n. The complete bipartite graph with vertex classes V\ and V 2 , 
where V\ fl V 2 — 0, is a graph with the vertex set V\ U V 2 and the edge set consisting 
of all edges {t>i,t>2} for v\ G V\ and v 2 G V 2 . 

If X C V(G), then G[X] denotes the subgraph induced by G on X (or spanned 
by X in G If X, F C F(G) are disjoint, then G[X, V] denotes the bipartite graph 
induced by G on vertex classes X and F, that is, F(G[X, F]) = X U F and there 
are those edges of G connecting a vertex in X with a vertex in F. 

We call X C F(G) homogeneous if it is a clique or an independent set. We call a 
pair of disjoint sets X, Y C F(G) homogeneous if G[X, F] is a complete or an empty 
bipartite graph. 

2.1.2 General relational structures 

A vocabulary is a finite sequence R±, . . . ,R m of relation symbols along with a se- 
quence ki, . . . ,k m of positive integers, where each fcj is the arity of the respective 
Ri. If £ is a vocabulary, a finite structure G over C (or C-structure G) is a finite 
set V(G), called the universe, along with relations i?f , . . . , i?^, where i?f has arity 
ki. The order of a structure G is the number of elements in the universe V(G). If 
U C F(G), then G induces on U the structure G[£7] with universe F(G[C/]) = U and 
relations R? U] , . . .,R% U] such that R° [U \v) = Rf(y) for every v e U k \ Two re- 
structures G and are isomorphic if there is a one-to-one map : F(G) — > F(if), 
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called an isomorphism from G to H, such that Rf(v) = Rj*(<f>(v)) for every i <m 
and all v £ V(G) ki . If G and H are isomorphic, we write G = H. An automor- 
phism of G is an isomorphism from G to itself. If U C V(G) and C V(if), we 
call a one-to-one map (ft : U W & partial isomorphism from G to H if it is an 
isomorphism from G[U] to if[W]. 

All these notions coincide with their standard graph-theoretic counterparts if we 
consider graphs structures with a single symmetric and anti-reflexive binary relation. 

2.2 Logic 

First order formulas are assumed to be over the set of connectives {->, A, V}. 

Definition 2.1 A sequence of quantifiers is a finite word over the alphabet {3, V}. 
If S is a set of such sequences, then 3S (resp. VS) means the set of concatenations 
3s (resp. Vs) for all s G S. If s is a sequence of quantifiers, then s denotes the result 
of replacement of all occurrences of 3 to V and vice versa in s. The set S consists of 
all s for s £ S. 

Given a first order formula $, its set of sequences of nested quantifiers is denoted 
by Nest($) and defined by induction as follows: 

1) Nest($) = {e} if $ is atomic; here, e denotes the empty word. 

2) Nest(^$) = Nest($). 

3) Nest($ A *) = Nest($ V *) = Nest($) U Nest(tf). 

4) Nest(3rr$) = 3Nest($) and Nest(Vx$) = VNest($). 

Definition 2.2 The quantifier rank of a formula $, denoted by qr($) is the maxi- 
mum length of a string in Nest (<!>). 

Proposition 2.3 Let $ be a first order formula with qr($) = k and suppose that 
none of variables x±, . . . occurs in $. Then there is an equivalent formula \I/ 
whose bound variables are all in the set {x±, . . . , Xk}- 

Proof. Let m = conn(<3>), where conn($) denotes the number of connectives in $. 
We proceed by induction on k and m. The base cases of k = 0, m arbitrary and 
m = 0, k arbitrary are straightforward. Assume the claim is true for all formulas $' 
with qr($') < k, conn( ( l )/ ) < m and with qr($') < k, conn($') < m. If $ = -i<E>', or 
$ = $'A$", or $ = $'V$", we apply the induction assumption to $' and obtain 
equivalent formulas ^' and and set * = or * = v]/' A or * = V 
respectively. If $ = 3x$ / or $ = Va^', we apply the induction assumption to $' to 
obtain an equivalent formula ty' with bound variables in {x±, . . . , a^-i} and rename 

X to Xfc. ■ 
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We adopt the notion of the alternation number of a formula (cf. [23, Definition 
2.8]). 



Definition 2.4 Given a sequence of quantifiers s, let alt(s) denote the number of 
occurrences of 3V and V3 in s. The alternation number of a first order formula $, 
denoted by alt($), is the maximum alt(s) over s e Nest($). 

Definition 2.5 Let G and G' be non- isomorphic structures over the same vocab- 
ulary C and $ be a first order formula over vocabulary C U {=}. We say that 
$ distinguishes G from G' if $ is true on G but false on G'. By D(G, G') (resp. 
D k (G,G')) we denote the minimum quantifier rank of a formula (with alternation 
number at most k resp.) distinguishing G from G'. By L(G, G') we denote the min- 
imum / such that over the variable set {x±, . . . ,xi} there is a formula distinguishing 
G from G'. 

Note that 

L(G, G') < D(G, G') < D k (G, G') < D k ^(G, G') 
for every k > 1, where the first inequality follows from Proposition 2.3. 

Definition 2.6 We say that $ defines a structure G (up to isomorphism) if $ 
distinguishes G from any non-isomorphic structure G' over the same vocabulary. By 
D(G) (resp. Dfc(Gr)) we denote the minimum quantifier rank of a formula defining 
G (with alternation number at most k resp.). 

Definition 2.7 Let G be an ^-structure. We define 

L(G) = max { L(G, G') : G' ^ G} , 

where G' ranges over all /^-structures non-isomorphic with G. 1 

Proposition 2.8 Let G he an C-structure. 

D(G) = max{D(G,G / ) : G' ¥ G} , 
D fc (G) = max{D k (G,G'):G'¥G}, 

where G' ranges over all C-structures non-isomorphic with G. Moreover, if G is a 
graph, one can suppose that G' ranges in the class of graphs rather than in the class 
of structures with a single binary relation. 

Proof. We prove the first equality; The proof of the second equality is similar. 
Given G 1 non-isomorphic with G, let $c be a formula of minimum quantifier rank 
distinguishing G from G', that is, qr($ G /) = D(G, G'). Let R = max G / qr($ G /). 
We have D(G) > R because D(G) > D(G, G') for every G' . To prove the reverse 
inequality ~D(G) < R, notice that G is defined by the formula $ = Ac $c whose 

1 In other terms, L(G) is equal to the minimum I such that G is definable in the ^-variable 
fragment of the infinitary logic FOoo^ (see [9]). 
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quantifier rank is R. The only problem is that $ is an infinite conjunction (a FO^- 
formula). However, as it is well known, over a fixed finite vocabulary there are only 
finitely many inequivalent first order formulas of bounded quantifier rank (see e.g. 
[8, 9, 15]). We therefore can reduce $ to a finite conjunction. 

If G is a graph, we do not need to consider G' which are not graphs because those 
are distinguished from G by formula 3x±E(xi, x±) V 3x 1 3x 2 (E(xi, x 2 ) A-iE(x 2 ,x 1 )). 
Notice that this formula has rank 2 while no formula of rank 1 can define G. ■ 

Proposition 2.9 Let G and G' he non-isomorphic graphs, ordinary or directed. 

1) D(G, G') = D(G,G 7 ), D k (G,G') = D k (G,G 7 ), andL{G,G') = L(G,G 7 ). 

2) D(G) = D(G), D fc (G) = D k (G), and L(G) = L(G). 

Proof. 1) If $ is a first order formula, let $ be the result of putting -i in front 
of every occurrence of the predicate symbol E in $. As easily seen, G \= $ iff 
G \= $. It remains to note that $ and $ have the same number of variables, the 
same sequences of nested quantifiers, and, consequently, equal quantifier ranks and 
alternation numbers. 

2) The second item follows from the first one by Proposition 2.8. ■ 

2.3 Games 

The Ehrenfeucht game is played on a pair of structures of the same vocabulary. In 
the case of graphs, if we want to translate the definition below into the language of 
graph theory, we just have to replace structures with graphs, elements with vertices, 
and universe with vertex set. 

Definition 2.10 Let G and G' be structures with disjoint universes. The r-round 
/-pebble Ehrenfeucht game on G and G' , denoted by Ehr^G, G'), is played by two 
players, Spoiler and Duplicator, with I pairwise distinct pebbles p\, . . . ,p\, each given 
in duplicate. Spoiler starts the game. A round consists of a move of Spoiler followed 
by a move of Duplicator. At each move Spoiler takes a pebble, say Pi, selects one of 
the structures G or G', and places pi on an element of this structure. In response 
Duplicator should place the other copy of pi on an element of the other structure. It 
is allowed to remove previously placed pebbles to another element and place more 
than one pebble on the same element. 

After each round of the game, for 1 < % < I let Xi (resp. x'j) denote the element 
of G (resp. G') occupied by pi, irrespectively of who of the players placed the pebble 
on this element. If pi is off the board at this moment, Xi and x\ are undefined. If 
after every of r rounds it is true that 

Xi = Xj iff x[ = x'j for all 1 < i < j < I, 

and the component-wise correspondence (x±, . . . ,x{) to ( partial iso- 

morphism from G to G', this is a win for Duplicator; Otherwise the winner is Spoiler. 
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The k- alternation Ehrenfeucht game on G and G' is a variant of the game in 
which Spoiler is allowed to switch from one structure to another at most k times 
during the game, i.e., in at most k rounds he can choose the structure other than 
that in the preceding round. 

The main technical tool we will use is given by the following statement. 
Proposition 2.11 

1) L(G,G') equals the minimum I such that Spoiler has a winning strategy in 
Ehr|(G, G') for some r. 

2) D(G, G') equals the minimum r such that Spoiler has a winning strategy in 

ehr;(g,g'). 

3) Dfc(G,G") equals the minimum r such that Spoiler has a winning strategy in 
the k-alternation Ehr^(G, G'). 

We refer the reader to [15, Theorem 6.10] for the proof of the first claim, to [9, 
Theorem 1.2.8], [15, Theorem 6.10], or [26, Theorem 2.3.1] for the second claim, and 
to [23] for the third claim. 

Proposition 2.11 immediately implies Propositions 2.3 and 2.9 that we earlier 
proved syntactically. Note that, if we prohibit removing pebbles from one vertex to 
another in Ehr^(G, G"), this will not affect the outcome of the game. 

Definition 2.12 We denote the variant of Ehr£(G, G') with removing pebbles pro- 
hibited by EHR r (G, G'). 

The examples below are obtained by simple application of Proposition 2.11. 

Example 2.13 

1) L(K m U K^, K m+1 U K~?) = m, D(K m U K^, K m+1 U K^l) = m + 1. 

2) L(K m+1 UKZ,K m U K^[) = D(K m+1 U TC^, K m U K^) = m + 1. 

3 Distinguishing non-isomorphic graphs 

We here prove Theorem 1.1, that will be restated as Theorem 3.15. The proof is 
based on the characterization of D(G,G') as the length of the Ehrenfeucht game 
on G and G' given by Proposition 2.11. The most essential part of the proof is 
contained in Lemma 3.13 that gives a winning strategy for Spoiler. This lemma will 
also be used in the next section to prove Theorem 1.2. We first introduce a couple 
of useful relations between vertices of a graph that will be intensively exploited in 
the course of the proofs. 
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3.1 Spoiler's preliminaries 

Definition 3.1 We call vertices u and v of a graph G similar and write u ~ v if 
the transposition (uv) is an automorphism of G. Let [u]g = {v G : i> ~ w}, 

(Tg{u) = \[u]g\, and o"(G) = max ue y( G ) cr G (u). If the graph is clear from the context, 
the subscript G may be omitted. We will call the numbers a{v) and cr(G) the 
similarity indices of the vertex v and the graph G respectively. 

In other words, u ~ v if every third vertex t is simultaneously adjacent or not to u 
and v. We will say that t separates u and v if t is adjacent to exactly one of the two 
vertices. 

Lemma 3.2 

1) ~ is an equivalence relation on V(G). 

2) Every equivalence class [u] is a homogeneous set. 

Proof. The lemma is straightforward. The only care should be taken to check the 
transitivity. Given pairwise distinct u, v, and w, let us deduce from u ~ v and 
v ~ w that u ~ w. For every t ^ u,w, we need to show that u and t are adjacent iff 
so are w and t. If t ^ v, this is true because both adjacencies are equivalent to the 
adjacency of v and t. There remains the case that t = v. Then u and v are adjacent 
iff so are u and w (as v ~ w), which in turn holds iff so are w and v (as u ~ v). ■ 

Definition 3.3 Given X C V(G), we will denote its complement by X = V(G)\X. 
Let u,v E X. We write w =x v if the identity map of X onto itself extends to an 
isomorphism from G[X U {u}] to G[X U {v}]. 

In other words, u =x v if these vertices have the same adjacency pattern to X, 
i.e., T(-u) nl = r(t> ) n X. Clearly, = x is an equivalence relation on X. 

Definition 3.4 C(X) is the partition of X into =x-equivalence classes. 

Let us notice a few straightforward properties of this partition. If X\ C X 2 , 
then C(X 2 ) is a refinement of C(Xi) on X 2 . For any X, the ~-equivalence classes 
restricted to X refine the partition C(X). 

Definition 3.5 Let X C V(G). We say that X is C-maximal if |C(X U {u})\ < 
\C(X)\ for any uEX. 

Lemma 3.6 Let X C V(G) be C-maximal. Then the partition C(X) has the fol- 
lowing properties. 

1) Every C in C(X) is a homogeneous set. 

2) If C\ and C 2 are distinct classes in C(X) and have at least two elements each, 
then the pair Ci, C 2 is homogeneous. 
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Proof. 1) Suppose, to the contrary, that C is neither a clique nor an independent 
set. Then C contains three vertices u, v, and w such that u and v are adjacent but 
u and w are not. However, if we move u to X, then C splits into two classes, one 
containing u and another containing w. Hence the number of equivalence classes 
increases at least by 1, a contradiction. 

2) Suppose that this is not true, for example, u G C\ is adjacent to v G C 2 
but not to w G C 2 . If we move n to X, then C 2 splits into two non-empty classes 
and C\ \ {u} stays non-empty. Again the number of equivalence classes increases, a 
contradiction. ■ 



Lemma 3.7 In every graph G, there exists a C-maximal set of vertices X such that 

|C(X)|>|X| + 1. (5) 



In particular, 



\X\ < (6) 



Proof. Such an X can be constructed, starting from X = 0, by repeating the 
following procedure. As long as there exists u G X such that C(X U {w}) > C(X), 
we move w to X. As soon as there is no such u, we arrive at X which is C-maximal. 
The relation (5) is true as it holds at the beginning and is preserved in each step. 
The bound (6) follows from the inequality \X\ + \C(X)\ < \G\. ■ 



Definition 3.8 Let X C V(G). 

y(X) is the union of all single-element classes in C(X). 
Z(X) = V(G) \ (XUY(X)). 

V(X) is the partition of Z(X) defined by V(X) = C(X U Y(X)). 

Clearly, V(X) refines the partition induced on Z(X) by C(X). 

Lemma 3.9 If X C V(G) is C-maximal, then every class D in T>(X) consists of 
pairwise similar vertices. Thus, T>(X) coincides with the partition induced on Z(X) 
by ^-equivalence classes. 

Proof. Let u and v be distinct elements of the same class D G T>(X). These vertices 
cannot be separated by any vertex t G XUY(X) by the definition of V(X). Assume 
that they are separated by a t G Z(X). Let C\ be the class in C(X) including D 
and C*2 be the class in C(X) containing t. Since t ^ Y(X), the class C 2 has at least 
one more element except t. If C\ ^ C 2 , moving t to X splits up C\ and does not 
eliminate C 2 . If C\ = C 2 , moving t to X splits up this class and splits up or does 
not affect the others. In either case, |C(A)| increases, giving a contradiction. ■ 



Definition 3.10 Let : X — > X' be a partial isomorphism from G to G' . Let 
v £ X and v' G X'. We call vertices v and v' <fi-similar and write v=$v' if <fi extends 
to an isomorphism from G[X U {v }] to G'[X' U {v'}]. 
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Note that, if u =x v and vl =x> v', then u vl iff v v'. This makes the following 
definition correct. 

Definition 3.11 Let <fi : X — > X' be a partial isomorphism from G to G". Let 
C eC(X) and C" G C(X'). We call C and C ^-similar and write C = C" if u = u' 
for some (equivalently, for all) v £ C and v' & C . 

Notice that, if u vl and v v', then the relations u =x v and vl =x> v' are 
true or false simultaneously. It follows that the 0-similarity is a matching between 
the classes in C(X) and the classes in C(X'), i.e., no class can have more than one 
0-similar counterpart in the other graph. 

3.2 Spoiler's strategy 

Definition 3.12 If v e V(G), the notation H = G © v means that 

• cr q {y ) > 2 and 

• H is a graph obtained from G by adding a new vertex t>' so that [v]h = 
[v]g U {«'}. 

In other words, t>' is similar to f and adjacent to v depending on if [v] G is a clique or 
an independent set. Furthermore, we define G(B0v = G and G@lv = (G@(l — l)v)@v 
for a positive integer I. 

Convention. In the sequel, writing H = G © Iv we will not require the inclusion 
V(G) C V(if) assuming that if is an arbitrary isomorphic copy of G © Iv. When 
considering the Ehrenfeucht game on G and H, we will in addition suppose that the 
vertex sets of these graphs are disjoint. 

Lemma 3.13 IfG and G' are non-isomorphic graphs of orders n and n' respectively 
and n < n' , then 

Di(G,G') < (n + 5)/2 (7) 
unless G' = G © (n' — n)v for some v G V(G). 

Proof. We will describe a strategy of Spoiler winning EHR r (G,G") for r = [(n + 
5)/2j unless G' = G © (n' — n)v. The strategy splits the game in two phases. 

Phase 1 

Spoiler selects a C-maximal set of vertices X C such that |C(X)| > |X| + 1, 

whose existence is guaranteed by Lemma 3.7. Denote s = \X\, the number of rounds 
in Phase 1, and t = \C(X)\. The bounds (5) and (6) read 

t > s + 1 (8) 

and 
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If Duplicator loses in Phase 1, by (9) this happens within the claimed bound. 

Let X' C V(G') consist of the vertices selected in Phase 1 by Duplicator and 
: X — > X' be the bijection defined by the condition that x and <f>(x) are selected by 
the players in the same round. We assume that Phase 1 finishes without Duplicator 
losing and hence is a partial isomorphism from G to G' . The following useful 
observation is straightforward from Definition 3.10. 

Claim 3.13.1. Whenever after Phase 1 Spoiler selects a vertex v G V(G) U V(G'), 
Duplicator responds with a 0-similar vertex or otherwise immediately loses. □ 

Phase 2 

Denote the classes of C(X) by Ci, . . . , C t and the classes of C(X') by C[, . . . , C[,. 
If there is a class Cj or Cj without any 0-similar counterpart, respectively, in C(X') 
or in C(X), then Spoiler selects a vertex in this class and wins according to Claim 
3.13.1, making at total s + 2 < (n + 3)/2 moves and at most one alternation between 
the graphs. From now on, we therefore assume that the 0-similarity determines a 
perfect matching between Ci, . . . , C f and C[, . . . , C' t ,, where actually t = t'. For the 
notational convenience, we assume that Cj =</, C[ for all i < t. 

Furthermore, if there is a singleton Ci or Cj whose 0-similar counterpart has 
at least two vertices, Spoiler selects such two vertices and again wins according to 
Claim 3.13.1. We will therefore assume that \Ci\ = 1 iff \C[\ = 1. Without loss of 
generality, assume that |Cj| = |C-| = 1 iff i < q. 

Denote Y = Y(X) and Y' = Y(X'). Let d = { Vi } and C\ = {y[} for % < q. 
Thus, Y = {yi, . . . , y q } and Y' = {y[, y' g }. Define 0* : X U Y -> X' U Y', an 
extension of 0, by 4>*{Vi) = Hi- 

Claim 3.13.2. 0* is a partial isomorphism from G to G', unless Spoiler wins in the 
next 2 moves with no alternation, having made at total s + 2 < {n + 3)/2 moves. 

Proof of Claim. For every i < q, the restriction 0* : X U Cj — > X' U C 4 ' is an 
isomorphism because Cj and C 4 ' are 0*-similar. The restriction <f>* :Y —*Y' should 
be an isomorphism as well by the following reason. If there are i,j<q such that 
yi and yj are adjacent but y\ and y'j are not or vice versa, then Spoiler wins on the 
account of Claim 3.13.1 by selecting y^ and yj. □ 

We will therefore assume that 0* is indeed a partial isomorphism from C to G' . 
Let Z = Z(X) and Z' = Z(X'). Denote the classes of T>(X) by D±, . . . , D p and the 
classes of V(X') by D[, . . . , D' pl . Note that 

p>t-q (10) 

and 

p = t-g iff P(X) = {C, +1 ,...,C t }. 

Claim 3.13.3. Whenever in Phase 2 Spoiler selects a vertex t/6ZU Z', Duplicator 
responds with a 0*-similar vertex or otherwise Spoiler wins in the next round at 
latest, with no alternation between G and G' in this round. 

Proof of Claim. Let u be the vertex selected by Duplicator in response to v and 
assume that u v. Suppose that v G Z' (the case of v G Z is completely similar). 
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If u Z, Duplicator has already lost by Claim 3.13.1. If u G Z, there exists a vertex 
w G X U Y such that u and w are adjacent but v and <f>*(w) are not or vice versa. If 
w G X, again Duplicator has already lost. If w G Y, then in the next round Spoiler 
selects <p*(w) and wins. □ 

Claim 3.13.3 implies that every class in T>(X) or D(X') has a 0*-similar coun- 
terpart in, respectively, V(X') or T>(X) unless Spoiler wins making in Phase 2 two 
moves and at most one alternation between the graphs. We will therefore assume 
that this is true, that is, the 0*-similarity determines a perfect matching between 
the classes Di, . . . , D p and D[, . . . , D' pf , where actually p = p'. For the notational 
convenience, we assume that Di D[ for all i < p. 

Claim 3.13.4- Unless Spoiler is able to win making 2 moves and at most 1 alternation 
in Phase 2, the following conditions are met. 

1) For every i < p, Dj and D[ are simultaneously cliques or independent sets. 

2) For every pair of distinct i,j < p, G[Di, Dj] and G'[D'i, D'] are simultaneously 
complete or empty bipartite graphs. 

Proof of Claim. 1) Since Di consists of X U V-similar and hence X-similar vertices, 
by Item 1 of Lemma 3.6, Di is either a clique or an independent set. This is actually 
true for the class C G C(X) including Di. If D[ has at least 2 vertices and is not a 
clique or an independent set simultaneously with Di, Spoiler wins in 2 moves with 1 
alternation by selecting in D\ two vertices which are non-adjacent in the former case 
and adjacent in the latter case. Indeed, if Duplicator responds with two vertices in 
C, those are in the opposite adjacency relation. If at least one Duplicator's response 
is not in C, he loses by Claim 3.13.1. Note that this argument applies, in particular, 
in the case that \D[\ > 2 but \Di\ = 1. 

2) If Di and Dj are included in the same class C G C(X), then by Item 1 of 
Lemma 3.6, G[Di,Dj] is either complete or empty. Similarly to the above, com- 
plete or empty respectively must be G'^D'^D^] unless Spoiler wins in 2 moves with 
1 alternation. If Di and Dj are included in different classes of C(X), C l and C 2 
respectively, then, since both C 1 and C 2 have at least 2 vertices, G[D i: Dj] is ei- 
ther complete or empty according to Item 2 of Lemma 3.6. If G'^D'^D'] is not, 
respectively, complete or empty, then Spoiler wins in 2 moves with 1 alternation by 
selecting, respectively, non-adjacent or adjacent vertices, one in D[ and another in 
Dj. Indeed, if Duplicator responds with one vertex in C 1 and another in C 2 , those 
are in the opposite adjacency relation. Otherwise Duplicator loses by Claim 3.13.1. 
□ 

Thus, in what follows we assume that the two conditions in Claim 3.13.4 are 
obeyed. Together with the fact that the D^s and the D'^s are classes of the partitions 
C(X UY) and C(X' U Y'), this implies that each Di and each D[ consists of pairwise 
similar vertices (in the sense of Definition 3.1). Moreover, since Di D[ for every 
% < p, where <p* is an isomorphism from G[X U Y] to G'[X' U Y'\ and Di and D\ 
are simultaneously cliques or independent sets, the graphs G and G' are isomorphic 
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iff \Di\ = \D[\ for every % < p. Since G and G' are supposed to be non-isomorphic, 
there is D { such that \Dj\ ^ We will call such a Di useful (for Spoiler). 

Claim 3.13.5. If Di is useful and p > t — q, then Spoiler is able to win having made 
in Phase 2 at most min{|.D i |, \D^\} + 2 moves and at most 1 alternation between G 
and G' . \ip — t — q, then min{|_Dj|, \D^\} + 1 moves and 1 alternation suffice. 

Proof of Claim. Spoiler selects min{ | Di | , | D • | } + 1 vertices in the larger of the classes 
Di and D\. Duplicator is enforced to at least once reply with not a 0*-similar vertex. 
Then, according to Claim 3.13.3, Spoiler wins in the next move at latest. If p = t — q 
and hence the D-classes coincide with the C-classes, this extra move is not needed. 
This follows from Claim 3.13.1 because in this case violation of the 0*-similarity 
causes violation of the 0-similarity. □ 

Suppose that there are two useful classes, Di and Dj. Observe that 

\Di\ + \Dj\ = \Z\ - Ei^ij |A| <(n-s-q)-(p-2) 
((n-s-q)-(t-q-l)<n-2s if p > t - q, , . 

-\(n-s-q)-(t-q-2)<n-2s + l if p = t - q, 1 ' 

where we use (10) and (8). It follows that one of the useful classes has at most 
(n— 2s+l)/2 vertices if p = t—q and at most (n— 2s)/ 2 vertices if p > t—q. Therefore, 
if p — t — q, Spoiler wins the game in at most s + (n — 2s + l)/2 + l = (n + 3)/2 
moves and, if p > t — q, in at most s + (n — 2s)/2 + 2 = (n + 4)/2 moves, which is 
within the required bound (7). 

Finally, suppose that there is a unique useful class D rn . According to Claim 
3.13.5, Spoiler is able to win in at most \D m \ + 2 moves, with the total number of 
moves s+|-D m |+2 that is within the required bound (7) provided \D m \ = 1. Thus, we 
arrive at the conclusion that the bound (7) may not hold true in the only case that 
there is exactly one useful class D m and \D m \ > 2. Note that we then have n' > n 
and \D' m \ = \D m \ + (n' — n). It remains to notice that, if we remove n' — n vertices 
from D' m , we obtain a graph isomorphic to G. It follows that G' = G © (n' — n)v 
with v G D m . m 

Note a direct consequence of Lemma 3.13 and Proposition 2.8, that will be 
significantly improved in the next section. 

Corollary 3.14 If a(G) = I, then D 1 (G) < (n + 5)/2. 

We now restate and prove Theorem 1.1 from the introduction. 

Theorem 3.15 If G and G' are non-isomorphic and have the same order n, then 

Di(G,G") < (n + 3)/2. (12) 

Proof. Lemma 3.13 immediately gives an upper bound of (n + 5)/2, which is a 
bit worse than we now claim. To improve it, we go trough lines of the proof of 
Lemma 3.13 but make use of the equality n = n' . The latter causes the following 
changes. Since n' = n, there must be at least two useful classes, Di and Dj, such 
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that \Di\ < |L>-| and \Dj\ > \D'j\- li p — t — q, the bound of (n + 3)/2 has been 
actually proved, and we only need to tackle the case that p > t — q. Similarly to 
(11), we have 

2|A|+2|Dj-|+2 < + + 1^1 + 1^1 <2((n-s-g)-(p-2)) <2(n-2s). 

It follows that at least one of \Di\ and \D'-\ does not exceed (n — 2s — l)/2. By Claim 
3.13.5, Spoiler wins in totally at most s + (n — 2s — l)/2 + 2 = (n + 3)/2 moves. ■ 

In the conclusion of this section, we state a lemma for further use in Section 6. 
This lemma is actually a corollary from the proof of Lemma 3.13. More precisely, 
it is a variant of Claim 3.13.5, where we take into account Lemma 3.9. 

Lemma 3.16 Let G and G' be arbitrary non-isomorphic graphs. Suppose that 
X C V(G) is C-maximal. Then Spoiler wins the 1-alternation Ehrenfeucht game on 
G and G' in at most \X\ + max^^ cr G {y) + 2 rounds. ■ 



4 Defining a graph 

Our next goal is to prove Theorem 1.2. Below this theorem is restated as Theorem 
4.6 after precisely defining the class of graphs whose members have a larger but 
efficiently computable D(G) (see Definition 4.5). The proof is based on the following 
four lemmas. 

Lemma 4.1 Let x,i (resp. x'J denote the vertex of G (resp. G') selected in the i-th 
round of EHR r (G, G'). Then, as soon as a move of Duplicator violates the condition 
that Xi ~ Xj iff x\ ~ x'j, Spoiler wins either immediately or in the next move possibly 
with one alternation between the graphs. 

Proof. Suppose, for example, that Duplicator selects x'j so that x\ x'j while 
Xi ~ Xj for some i < j. Suppose that the correspondence between the x m 's and 
the oj^'s, 1 < m < j, is still a partial isomorphism. Then there is y E V(G') 
adjacent to exactly one of x\ and x'y Note that such y could not be selected by the 
players previously. In the next move Spoiler selects y and wins, whatever the move 
of Duplicator is. ■ 



Lemma 4.2 Let \G\ = n, v G V{G), a G (v) = s, and G' = G®lv with I > 1. Then 

s+l < L(G,tf) < D l( G,G<) < .+1+211 < { {n + 6) ^,Kl%' 7 ( "- > 1)/ , 2 9 ' 

s + 1 I s + 3 — l/(n/2 + 1) for s > n/2. 

Proof. The lower bound is given by the following strategy for Duplicator in 
Ehr*(G, G'). Whenever Spoiler selects a vertex outside [v] in either graph, Du- 
plicator selects its copy in the other graph. If Spoiler selects an unoccupied vertex 
similar to v, then Duplicator selects an arbitrary unoccupied vertex similar to v in 
the other graph. Clearly, this strategy preserves the isomorphism arbitrarily long, 
that is, is winning for every r. 
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The upper bound for D 1 (G, G') is ensured by the following Spoiler's strategy 
winning in the 1-alternation EHR r (G, G') for r = [s + 1 + ^jpj-J. In the first round 
Spoiler selects a vertex in [v]g>- Suppose that Duplicator replies with a vertex in 

Mo- 
Case 1: \[u]g\ < s. Spoiler continues to select vertices in \v\g>- In the (s + l)-th 
round at latest, Duplicator selects a vertex outside [u] G - Spoiler wins in the next 
move by Lemma 4.1, having made at most s + 2 moves and one alternation. 

Case 2: \[u]g\ > s + 1. Spoiler selects one vertex in each similarity class of 
G' containing at least s + 1 vertices. Besides [v]g>, there can be at most j^f such 
classes. At latest in the |_^ff + lj-th round Duplicator selects either another vertex 
in a class with an already selected vertex (then Spoiler wins in one extra move by 
Lemma 4.1) or a vertex in [w]q with \[w]g\ < s. In the latter case Spoiler selects s 
more vertices in the corresponding class of G' . Duplicator is forced to move outside 
[w]g and loses in the next move by Lemma 4.1. Altogether there are made at most 
[ r j^ + l\+s + l<s + l + ^ moves. 

If s > n/2, the last inequality of the lemma is straightforward and, if 2 < s < 
(n - l)/2, it follows from the fact that the function f(x) = x + ^±1 on [2, (n + l)/2] 
attains its maximum at the endpoints of this range. ■ 

Using Lemma 4.2, Lemma 3.13 can now be refined. 
Lemma 4.3 If G and G' are graphs of orders n < n', then 

Di(G,G") < (n + 5)/2 

unless 

(7(C) > n/2 and G' = G (n' - n)v for some v E V(G) with a G (v) = cr(G). (13) 

In the latter case we have 

<j{G) + 1 < L(C, G') < Di(C, G') < <r{G) + 2. (14) 

Note that the condition (13) determines G' up to isomorphism with two exceptions 
if n is even. Namely, for G = K m U K m and G = K m U K m there are two ways to 
extend G to G' . 

The gap between the bounds (14) can be completely closed. 

Lemma 4.4 Let G and G' be graphs of orders n < n' . Assume the condition (13). 
Then L(G, G') = <t(G) + 1 for all such G and G' , D (G, G') = <r(G) + 1 if [v} G is an 
inclusion maximal homogeneous set, and D(G, G') = a(G) + 2 if [v] G is not. 

Using the inequality |[i>]g| ^ n /2 and the mutual similarity of vertices in [v]g, one 
can easily show that [v] G is a maximal (with respect to the inclusion) clique or 
independent set iff \[v]q\ = ol(G) or \[v\q\ = ^(G) respectively. However, the former 
condition is more preferable because it is efficiently verifiable. 
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Proof. For simplicity we assume that [v]q is an independent set. Otherwise we can 
switch to G and G' by Proposition 2.9. Denote s = a(G) = \[v]g\- 

Case 1: [v]q is maximal independent. We show the bound Dq(G, G') < s + 1 
by describing Spoiler's strategy winning the O-alternation Ehr s+ i(G, G'). Spoiler 
selects s + 1 vertices in [v\g>- Duplicator is forced to select at least one vertex 
ui G [v]g and at least one vertex u 2 4- \p\g- Since [v]g is a maximal independent 
set, Mi and u 2 are adjacent and this is Spoiler's win. 

Case 2: [v]q is not maximal. We first show the bound L(G, G') < s + 1 by de- 
scribing Spoiler's strategy winning Ehr^^G, G'). As in the preceding case, Spoiler 
selects s+1 vertices in [v\g> and there are u\ G [v]g and u 2 4- \p\g selected in response 
by Duplicator. Assume that ui and u 2 are not adjacent for otherwise Duplicator 
loses immediately. Since U\ and u 2 are not similar, there is u G V(G) \ {ui,u 2 } 
adjacent to exactly one of ui and u 2 . It follows that u ^ [v]g- Note that u could 
not be selected by Duplicator in the first s+1 rounds without immediately losing. 
Therefore, Duplicator has selected in [v]g at least two vertices, say, uo and u\. In the 
(s + 2)-th round Spoiler removes the pebble from -u to u and wins because the coun- 
terparts of Mi and u 2 in G' are similar and hence equally adjacent or non-adjacent 
to any counterpart of u. 

We now show the bound D(G,G") > s + 1 by describing Duplicator's strategy 
winning Ehr s+ i(G, G'). Whenever Spoiler selects a vertex of either graph, Dupli- 
cator selects its copy in the other graph, with the convention that the copy of a 
vertex in [v] G / is an arbitrary unselected vertex in [v]q- This is impossible in the 
only case when Spoiler selects s + 1 vertices all in [v]g>- Then Duplicator, in addition 
to s vertices of [v]g, selects one more vertex extending [v]g to a larger independent 
set. ■ 

Definition 4.5 S is the class of graphs G with cr(G) > (|G| + 3)/2. Si is the class of 
graphs G with u{G) > (]G|+3)/2 such that the largest similarity class is an inclusion 
maximal homogeneous set. S 2 is the class of graphs G with a(G) > (\G\ + l)/2 such 
that the largest similarity class is not an inclusion maximal homogeneous set. 

Theorem 4.6 L(G) < (\G\ + 5)/2 with the exception of all graphs in S. If G G S, 
then L(G) = <r(G) + 1. 

Di (G) < ( ] G\ + 5)/2 with the exception of all graphs in SiLiS 2 . If G G S\, then 
D(G) = a(G) + 1; IfGe S 2 , then D(G) = a{G) + 2. 

Proof. We prove the theorem for L(G); the proof for D(G) is completely similar. 
Recall that L(G) = max { L(G, G') : G' ^ G}. We consider two cases. 

Case 1: a{G) < \G\/2. For every G' ? G we have L(G, G') < {\G\ + 5)/2 by 
Lemma 4.3. Since G ^ S, the theorem in Case 1 is true. 

Case 2: a(G) > \G\/2. If G' = G © Iv for some I > 1 and v G V(G) with 
&g{v) — then L(G, G') = u{G) + 1 by Lemma 4.4. By the definition of S, we 

therefore have L(G, G') < (\G\ + 5)/2 if G £ S and L(G, G') > (\G\ + 5)/2 if G G S. 

If G = G' @lv for some I > 1, G' with <j(G') > \G'\/2, and v G V(G') with 
a G ,(v) = a(G r ), then L(G,G') = a{G') + 1 by Lemma 4.4 and hence L(G, G') < 
a(G) + 1. 
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If G' is any other graph non-isomorphic with G, then L(G, G') < {\G\ + 5)/2 by 
Lemma 4.3. 

Summarizing, if G ^ <S, we have max^ L(G, G") < (|G| + 5)/2 and, if G G S, we 
have max G / L(G, G') = cr(G) + 1 > (|G| + 5)/2. Thus, in Case 2 the theorem is also 
true. ■ 

A variant of Theorem 4.6 was stated in the introduction as Theorem 1.2. To link 
the two theorems, in the latter we should set C = S\ US 2 - The efficiency statements 
of Theorem 1.2 are due to the following lemma. Referring to efficient algorithms, we 
mean random access machines whose running time on graphs of order n, represented 
by adjacency matrices, is 0{n 2 log n). 

Lemma 4.7 

1) There is an efficient algorithm that, given G, finds the partition ofV(G) into 
classes of pairwise similar vertices. 

2) Given G, the number o(G) is efficiently computable. 

3) The classes S, S±, and S 2 defined in Definition 4.5 are efficiently recognizable. 

Proof. Notice that non-adjacent vertices are in the same similarity class iff the 
corresponding rows of the adjacency matrix are identical. Thus, in order to find 
similarity classes containing more than one element, it suffices, using the standard 
0(n logn)-comparison sorting, to arrange rows of the adjacency matrices of G and 
G in the lexicographic order. This proves Item I. The other two are its direct 
consequences. ■ 



Remark 4.8 An analysis of the proofs shows that Theorem 4.6 is even more con- 
structive: Given a graph G, one can efficiently construct its defining formula whose 
quantifier rank is as small as possible if G G Si U S 2 and does not exceed (n + 5)/2 
if G i Si U <S 2 . 

Remark 4.9 Note that Definition 3.1 of the similarity relation makes sense for an 
arbitrary structure. Lemma 4.1 generalizes over any class of ^-structures, for an 
arbitrary vocabulary C: If precisely one of the relations Xi ~ Xj and x\ ~ x'j holds, 
then Spoiler is able to win in at most k — 1 next moves, where k is the maximum 
relation arity of C. If, say, x\ / x'j but Xi ~ Xj, the only what Spoiler needs to do is 
to exhibit u±, . . . , u^-i G V(G') such that there is a sequence consisting of elements 
Mi, ... , Mfe-i and a variable x that satisfies some relation with x = Xi and does not 
satisfy the same relation with x — Xj. 

Lemma 4.2, whose proof uses only Lemma 4.1 and the definition of similarity 
classes, carries over to structures with maximum relation arity k giving bounds 

77 -I- 1 

s + 1 < L(G, G') < Di(G, G') < s + k - 1 + — —-. 
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5 Distinguishing graphs by zero-alternation for- 
mulas 



Theorem 3.15 is proved in a stronger form: The class of distinguishing formulas is 
restricted to those with alternation number 1. We now further restrict the alterna- 
tion number to the smallest possible value of 0. In terms of the Ehrenfeucht game, 
we restrict the ability of Spoiler to alternate between graphs during play (see Item 
3 of Proposition 2.11). Moreover, let us call a formula existential (resp. universal) 
if it is in the negation normal form and all quantifiers in it are existential (resp. 
universal). It is easy to prove that, if a graph G is distinguished from another graph 
G' by a formula with alternation number 0, then G is distinguished from G' by either 
existential or universal formula of the same quantifier rank. Somewhat surprizingly, 
this restriction of the class of distinguishing formulas turns out not so essential in 
the worst case. 

Theorem 5.1 If G and G' are non- isomorphic and have the same order n, then 



Proof. We will describe a strategy for Spoiler winning the 0-alternation game 
EHR[( ra+ 5)/2j (G, G'). Given a set of vertices X in a graph and a partial isomorphism 
: X — > X' to another graph, we will use the notions introduces in Section 3.1: the 
partitions C(X) and T>(X), the set Y(X), and the 0-similarity relation We set 
the following notation: 



For brevity, we will not indicate the dependence on X, writing merely Y, s, t, c, 
and d. 

At the start of the game Spoiler, over all choices of H = G or H = G', and of 
X C V(H) with t > s + 1 takes one which 

(criterion 1) first maximizes s; 

(criterion 2) then, if there is still some choice, minimizes c; 
(criterion 3) finally, minimizes d. 

Let us assume H = G. As s + 1 < n, we have s < (n — l)/2. 

Spoiler selects all vertices in X in any order. Denote the set of vertices of G' 
selected in response by Duplicator by X' . Let t' = t(X') and d = c(X'). Assume 
that Duplicator has not lost up to now, that is, has managed to maintain the partial 
isomorphism : X — > X'. Let Ci, . . . , C t (resp. C[, . . . , C[l) be all classes in C(X) 



D (G,G") < (n + 5)/2. 



s(X) 
t(X) 
c(X) 
d(X) 



\C(X)\, 

max{|C| : C E C(X)} , 
max{|D| : D E V(X)} . 



X 



(resp. C{X')). 



23 



If there is a class Cj without 0-similar counterpart in C(X'), Spoiler wins in one 
move by selecting a vertex in Cj, having made s + 1 < (n + l)/2 moves at total. 
We therefore suppose that t' > t and, for every i < t, the classes Cj and are 
0-similar. Thus, t'>s + l — s' + l and, by Criterion 2 of the choice of (H,X), we 
conclude that there is C' m such that |Cj| < \C' m \ for all i 

If t' > t, define Cj = for all t < % < t'. Suppose first that for some % < t' 
we have |Cj| ^ |C-|. As G and G' have the same order, there must be an i < t' 
with \Ci\ > \C[\. Spoiler wins by selecting |C-| + 1 vertices inside Cj. Observe that 
\Ci\ < \Ci\ < \C' m \ and that 

2 |C;| + 1 < |C;| + |C^| < (ra - s') - (t' - 2) < n - 2s + 1. 

The total number of Spoiler's moves is therefore at most s + |C t '| + 1 < n/2 + 1, 
within the required bound. 

Suppose from now on, that t' — t and for any i < t we have \Ci\ — \C[\. In 
particular, 

c = d. (15) 

Without loss of generality, assume that |Cj| = \C[\ = 1 precisely for i < q. Note that 
Y = Ui=i C and F' = Ui=i where Y' = Y(X'). Similarly to the proof of Lemma 
3.13, we extend <fi to (jf : X U Y — > X' U F' by the condition that 0* maps each 
Ci with ? < g onto C[. Similarly to Claim 3.13.2, if 0* is not an isomorphism from 
G[XUF] to G'[X' UY'}, then Spoiler wins by selecting 2 vertices in Y, having made 
altogether s + 2 < (n + 3)/2 moves. In the sequel we therefore suppose that <p* is a 
partial isomorphism from G to G'. We will make use of the following observation, 
provable similarly to Claim 3.13.3. Let Z = V(G) \(IU Y). 

Claim 5.1.1. From now on, whenever Spoiler selects a vertex v G Z, Duplicator 
responds with a 0*-similar vertex or otherwise loses in the next round at latest with 
no alternation. □ 

Let Di,...,D p (resp. D[, . . . , D' pl ) be all classes in T>(X) (resp. V(X')). We now 
claim that every class Di has a 0*-similar counterpart in V(X') or otherwise Spoiler 
wins in at most 2 next moves with no alternation, having made altogether at most 
s + 2 < (n + 3)/2 moves. Indeed, if a D { has no 0*-similar counterpart, Spoiler 
selects a vertex in the Di and wins either immediately or in the next move by Claim 
5.1.1. We hence will assume that p' > p and, for all i < p, the classes Di and D[ are 
(/•""-similar. If p' > p, define Di = for all p < i < p' . 

We now show that each class in T>(X) or V(X') consists of pairwise similar 
vertices as defined in Definition 3.1. Suppose, to the contrary, that vertices u and v 
lie in the same Di and some w is connected to one of u, v but not to the other. By 
the definition of Di such w must lie in Z; but then moving w to X we increase t at 
least by one. Indeed, if w belongs to the same Di, the class C 1 G C(X) including Di 
splits up into at least two subclasses, containing u and v respectively, while no class 
in C(X) disappears. If w belongs to another Dj, the class C 1 splits up as well, while 
the class C 2 G C(X) including Dj still stays because it has at least two elements. 
Since the relation t > s + 1 is preserved, we get a contradiction with Criterion 1 in 
the choice of (H,X). The same argument applies for V(X'). 
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It follows that, for any distinct i,j < p, each of G[A], G'[D'^, G[Di,Dj] and 
G'[-D-, Dj] is either complete or empty. The same is true about every G[D i: {v}] and 
G'i-D-, {v'}\ for v G X U F and i/ G X' U F'. We now claim that, for every % < p, 
j < p such that j 7^ i, and v G X U F, 

1) G[A] with at least 2 vertices is complete iff G'[D'^ is, 

2) G[A,-Dj] is complete iff G'^D^ is, and 

3) G[A, M] is complete iff {(f)*(v)}} is 

or otherwise Spoiler wins in at most 3 next moves with no alternation, having made 
altogether s + 3 < (n + 5)/2 moves. For example, consider the case that 
has at least 2 vertices and is complete but G'[D'^\ is empty. Then Spoiler selects 
two vertices in D^. If both Duplicator's responses are in D[, he loses immediately. 
Otherwise Duplicator responds at least once with a vertex which is not 0*-similar. 
Then Spoiler wins in the next move according to Claim 5.1.1. 

We therefore suppose that the above three conditions are obeyed for all i, j < p 
and v G X U F. It follows that, if | D { \ = |-D-| for all % < p' and, in particular, p' = p, 
then G and G' should be isomorphic. Since this is not so, there is I < p' such that 
\Di\ 7^ \D[\. As G and G' have the same order, we can assume that 

IAI > \D[\. (16) 

Note that p' > t — q for else the D'-classes are identical to the C"-classes, which 
contradicts (16). Thus 

p' > s + 2 - q. (17) 

It follows from (15) and Criterion 3 of the choice of (H,X) that there exists 
k < p' such that |A| < for all %. We have \D[\ < \D t \ < \D' k \, so 

2 \D[\ + 1 < \D[\ + \D' k \ < (n - s - q) - (p' - 2) < n - 2s, 

where the latter inequality follows from (17). 

Now, Spoiler selects \D[\ + 1 vertices inside D[. Duplicator cannot reply to this 
with all moves in D[ and hence replies at least once with a vertex which is not 
0*-similar. According to Claim 5.1.1, Spoiler wins either immediately or in the next 
round. The total number of moves is at most 



i^/i , , n — 2s — \ n n + 3 
s + \Di\ + 1 + 1 <s + + 2 = — — 



as required. 



6 Defining graphs of bounded degree 

The degree of a vertex v in a graph, denoted by deg(i>), is the number of edges inci- 
dent to v. The maximum degree of a graph G is defined by A(G) = max t , e v(G) deg(t> ). 
The distance between vertices v and «ina graph, dist(i>, u), is the smallest number 
of edges in a path from v to u. If U C V(G), then dist(t> , U) = min ue ;7 dist(v, u). 
Recall that the similarity index ctg(v) of a vertex v is defined in Definition 3.1. 
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Lemma 6.1 If v is a non-isolated vertex of a graph G, then 

a G (v) < A(G) + 1. (18) 

Proof. By Lemma 3.2, the similarity class [v] G is either a clique or an independent 
set. If it is a clique, then the bound (18) is clear. Otherwise there must exist a 
vertex u [v]g adjacent to v. As u is adjacent to every vertex in [v]g, we have 
o~g{v) < deg(w) < A(G) in this case. ■ 

Recall that, while a formula defining a graph G distinguishes G from all non- 
isomorphic graphs, a formula identifying G distinguishes G from all non-isomorphic 
graphs of the same order. While the minimum quantifier rank of a defining formula 
with alternation number at most k is denoted by Dk(G), for an identifying formula 
it is denoted by t) k (G). By Proposition 2.8 we have 

D fc (G) = max{D fc (G,G") : G' ^ G} , 

and likewise 

D fc (G) = max{D fc (G,G') : G'^G, \G'\ = \G\} . 

Theorem 6.2 Let d > 2. If G is a graph of order n with A(G) = d that has no 
isolated vertex and no isolated edge, then 

Di(G) < c d n + d 2 + d + 7/2 

for a constant c d = \ — ^d~ 2d ~ h . If G is an arbitrary graph of order n with A(G) = d, 
then the same hound holds for D 1 (G). 

The constant q as stated in the theorem is far from being best possible. We do 
not try to optimize it; Our goal is more moderate, just to show the existence of 
a Cd strictly less than 1/2. A tight, up to an additive constant, bound is easy 
to find for d — 2. If A(G) = 2 and G has no isolated vertices and edges, the 
graph is a sum of paths and cycles. Using bounds D (C n , C m ) < log 2 n + 0(1) and 
D (P n , P m ) < log 2 n + 0(1) for m ^ n (e.g. [26, Theorem 2.1.2]), one can show that 
Di(G) < n/3 + 0(l). 

Proof of Theorem 6.2. We will prove the bounds for D x (G) and t) 1 (G) in parallel. 
We actually have to estimate Di(G, G') in two cases: 

1) G has no isolated vertices and edges; G' ^ G has arbitrary order. 

2) G is arbitrary; G' ^ G has order n. 

In fact, almost all proof will go through for the most general case that G' ^ G, with 
no other assumptions. Only once we will need the condition \G'\ = \G\ for G with 
isolated vertices or edges, and this will be explicitly stated. 

Referring to Item 3 of Proposition 2.11, we design a strategy for Spoiler winning 
the 1-alternation Ehrenfeucht game on G and G' in at most c d n + d 2 + d+7/2 rounds. 
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Clearly, we may assume that A(G') < d for otherwise Spoiler wins in at most d + 2 
moves by selecting a star K 1>d+1 in G' . 

A component of a graph is a maximal connected induced subgraph. We call a 
component small if it has at most d 2 + 1 vertices. Throughout the proof we use the 
following notation. 

A C consists of all vertices in small components. 

Bi C consists of all isolated vertices. 

-E? 2 c V(G ! ) consists of all vertices in isolated edges. 

B = B x U B 2 . 

a = \A\. 

bi = W. 

b 2 = \B 2 \/2. 

A', B[, B' 2 , a', b[, and b' 2 are similarly defined for G'. 
We set r = ^d~ 2d ~ 5 . Spoiler will choose one of two strategies depending on how 
large or small a is. 

Strategy 1 (applicable if a > rn) 

Case 1: G[A\ = G'[A'\. 
Spoiler plays outside A and A 1 using the strategy for the game on non-isomorphic 
graphs G\A\ and G'[A'} described in the proof of Lemma 3.13. If Duplicator never 
moves in A or A', Spoiler wins, according to Lemmas 3.7, 3.16, and 6.1, in at 
most (n — a — l)/2 + (d + l) + 2 = (n — a)/2 + d + 5/2 moves. If Duplicator 
makes a move in A or A', Spoiler, who has selected in this round a vertex v in 
a component with more than d 2 + 1 vertices, wins by selecting a set of d 2 + 2 
vertices that includes v and spans a connected subgraph. Thus, Spoiler needs at 
most 2=2 + d 2 + d+l<{\- \r)n + d 2 + d + \ moves to win. 

Case 2: G[A\B] ^ G'[A'\B']. 
Spoiler enforces play in A \ B and A' \ B'. He starts in G if G[A \ B] has at least as 
many components as G'[A' \ B'} has and in G' otherwise. Without loss of generality 
assume the former. 

Spoiler selects one vertex in each component of G[A\B\. This takes at most n/3 
moves as every component of G[A \ B] has at least 3 vertices. Spoiler keeps doing 
so until one of the following happens. 

1) Duplicator moves in B' . Then Spoiler wins in at most 2 extra moves. 

2) Duplicator moves outside A'. Then Spoiler switches to G' and wins in at 
most d 2 + 1 extra moves by selecting a connected subgraph spanned by d 2 + 2 
vertices. 

3) While Spoiler selects a vertex in a component C of G[A \ B], Duplicator 
responds with a vertex in a component C' of G'[A' \ B'\ such that C' ^ C. 
Then Spoiler wins in at most d 2 extra moves by selecting all vertices of C if 
\C\ > \C'\ or all vertices of C' otherwise. 

It is clear that one of the three situations must happen sooner or later. Thus, Spoiler 
wins in at most n/3 + d 2 + 1 moves with at most 1 alternation between G and G' . 
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If neither Case 1 nor Case 2 takes place, then 



G[B] ^ G'[B'\. 

Case 3: bi ^ b[ and b 2 ^ b 2 . 
Since b x + 2b 2 < n, we have bi < n/3 or b 2 < n/3. If bi < n/3 , Spoiler selects 
min{6i, b[} + 1 isolated vertices in the graph containing larger number of them and 
wins in the next move with alternation between the graphs. If b 2 < n/3, Spoiler 
selects one vertex in each of min{6 2 , b' 2 } + 1 isolated edges in one of the graphs, where 
this is possible, and then wins in at most 2 next moves with possibly 1 alternation. 
At total, at most n/3 + 3 moves are needed. 

It remains to tackle the situation when exactly one of the inequalities b\ ^ b[ 
and b 2 7^ b' 2 is true. Let j G {1,2} be the index for which \Bj\ ^ \B'j\ (then 

\ B 3-j \ = \ B 'z-j\)- 

Case 4: min{|Sj|, \B'$ < n/3. 
Spoiler wins in at most min{|5j|, |_Bj|}+3 < n/3+3 moves with at most 1 alternation 
similarly to Case 3. 

Case 5: min{|Sj-|, \B'j\} > n/3. 
This is the only place in the proof where we need to assume that \G'\ = \G\. We 
can do so because in Case 5 the graph G must have isolated vertices or edges. It 
follows that G[B] and G'[B'] are non-isomorphic graphs of different orders. Denote 
the graph of the smaller order by H and the other graph by H' . Spoiler enforces 
play on H and H' as follows: As soon as Duplicator moves in B or B', Spoiler, who 
has selected in this round a vertex u in a component C of G or G' with at least 3 
vertices, wins in 2 extra moves by selecting two more vertices U\ and u 2 in C so that 
u, ui, and u 2 span a connected subgraph. Note that 

\H\<n-b< In/3. 

As long as Duplicator moves outside B and B', Spoiler uses the following strategy 
for the Ehrenfeucht game on H and H' . We will now refer to Definition 3.12. If 
H' 7^ H © Iv for any v G V(H), then Spoiler follows the strategy from Lemma 3.13 
and wins in at most \H\/2 + 2 moves with at most 1 alternation. If H' = H @lv for 
some v G V(H), then Spoiler follows the strategy from Lemma 4.2 and wins in at 
most 

\ H \ + 1 /x \ H \ , TT \ 3 

' { \ ~ i + M« + 1 < V + ° ^ + 9 
&h{v) + 1 2 2 

moves with at most 1 alternation between the graphs. As H has no isolated vertices, 
by Lemma 6.1 we have <r(H) < A(H) + 1 < d+ 1. Thus, in Case 5 Spoiler needs at 
most \H\/2 + d + 9/2 < n/3 + d + 9/2 moves to win. 

In any of Cases 1-5 Spoiler makes at most 1 alternation and at most (| — |r)n + 
d 2 + d + | moves, which is actually the claimed bound. 

Strategy 2 (applicable if a < rn) 
We split our description of Spoiler's strategy into four phases. 
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Phase 1. 

Spoiler selects all vertices in the set A. 
Phase 2. 

Spoiler will make moves in pairs. Let % > 1. Denote the vertices selected by him 
in the (2i — l)-th and 2«-th rounds of Phase 2 by Xj and yi respectively. Sup- 
pose that Spoiler has already made 2{i — 1) moves and selected a set X^\ = 
A U {xi, yi, . . . , Xj_i, yj-i} C V(Gr). Let us explain how Xj and are now selected. 
If there is a vertex x G V(G f ) such that 

• dist(x, Xi-i) > 5 and 

• for any y with dist(x, y) < 2 we have deg(y) < deg(x), 
then Spoiler selects this x for Xj. 

Claim 6.2.1. Suppose that Xj = x does exist. Then there are vertices u,y,v such 
that {x, u}, {u, y}, {y, v} G E{G) while {x, y}, {x, f } ^ E{G). 

Proof of Claim. Let C be the component of G containing x. It should contain a 
vertex v with dist(x, v) = 3 for else every vertex of C would be at distance at most 
2 from x and hence C would have at most 1 + d + d(d — 1) = d 2 + 1 vertices. Let 
(x, u, y, v) be an arbitrary path from x to v. The vertices u, y, v are as desired. □ 

If Xj = x is selected, Spoiler chooses some u, y, v as in the claim and takes the y 
for yi. 

If no such x exists, Phase 2 ends. Suppose that Phase 2 lasts 2r rounds. Recall 
that the partition C(X), where X C V(G), is defined by Definition 3.4. 

Claim 6.2.2. |C(X;)| > |C(^_i)| + 3 if i < r and \C(X r )\ > |C(X r _i)| + 2. 

Proof of Claim. We will show that, if we extend X^i to X^ one of the classes in 
C(X i _ 1 ) splits up into at least 4 parts if i < r and into at least 3 parts if i — r. 

By the choice of u, y — y^ and v, we have dist(xj,w) = 1 and dist(xj,f) = 3. 
Since dist(xj, X^i) > 5, neither u and v is in X^. Note that u is adjacent to both 
Xi and yi, while v is adjacent to yi but not to Xj. 

Note also that T(xi) \ T(yi) ^ because Xj has no less neighbors than y^ has and 
v is a neighbor of yi but not of Xj. Thus, there is a vertex w adjacent to Xj but not 
to t/j. Like u and t>, we have w ^ 

Thus, -u, v, and u? belong to pairwise distinct classes oiC(Xi). If we assume that 
i < r, we are able to find a vertex in a yet another class. Indeed, consider x = x i+1 . 
Since dist(x, Xi) > 5, this vertex is adjacent neither to x« nor to y^. 

On the other hand, every of u, v, w, and x is at distance at least 2 from Xj_i. 
Therefore all of them are in the same class of C(X i ^. 1 ). □ 

Phase 3. 

As long as possible, Spoiler extends X = X r by one vertex so that |C(X)| increases 
at least by 1. Phase 3 ends as soon as Spoiler arrives at a C-maximal set in the 
sense of Definition 3.5. 

Suppose that Phase 3 lasts h rounds. At the end of this phase we therefore have 

\X\ = a + 2r + h 
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and 

\C(X)\ >l + 3(r-l) + 2 + h = 3r + h. 

It follows that \C{X)\ > \X\+r -a and hence n > \X\ + \C(X)\ > 2\X\ + r - a. 
We conclude that 

n + a - r 
1*1 < g • 

Phase 4- 

Spoiler now plays precisely as in Phase 2 of the strategy designed in the proof of 
Lemma 3.13. By Lemma 3.16, with Lemma 6.1 taken into account, Spoiler wins 
making totally at most 

\X\+d + 3< n + a - r +d + 3< n + Tn - r + d + 3 (19) 

moves. It therefore remains to show that the duration of Phase 2, controlled by r, 
is linearly related to n (the parameter r is chosen small enough). 

Claim 6.2.3. Let V k = {x e V(G) : dist(ar, X r ) > 2k + 3}. Then V d+1 = 0. 

Proof of Claim. Assume, to the contrary, that Vd+\ ^ 0. We will show that then 
there is x r+ \ such that dist(a; r+ i, X r ) > 5 and every y at distance at most 2 from 
x r+ \ has smaller degree, contradicting the fact that Phase 2 lasts 2r rounds. Let 
di = max{deg(a;) : x G Vj}. If no Zi G Vi with deg(zj) = di can be taken for x r+i , 
then d i+ i < di. Indeed, for any such there is y such that dist(zj,y) < 2 and 
deg(y) > deg(zi). The latter implies that y ^ Vi, i.e., dist(y,X r ) < 2i + 2. It 
follows that dist(zj,X r ) < 2i + 4, hence ^ V^ + i and d i+ i < di. Since the chain 
di > c?2 > c?3 > . . . can have length at most d, some ^ with % < d + 1 can be taken 
for This contradiction proves the claim. □ 

Thus, | Vrf_|_i | = n. By the definition of Vd+i we have 

|^| < \X r \(l + d + d{d - 1) + d{d - I) 2 + . . . + d{d - l) 2d+3 ) < \X r \d 2d+5 

(note that d > 2, which follows from the assumption that a < n). Putting it 
together, under the assumption that a < rn, we obtain 

rn + 2r > a + 2r = \X r \ > n/d 2d+5 , 

which implies that 

r > n(d- 2d ~ 5 - t)/2. 
Substituted in (19), this shows that Spoiler wins in at most 

(l + lT-]d- 2d - 5 )n + d + 3 
\2 4 4 / 

moves, which is within the required bound. ■ 
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7 The worst case dimension of the Weisfeiler-Lehman 
algorithm 

The main purpose of this, mostly expository, section is to give a self-contained 
proof of an upper bound for the dimension of the Weisfeiler-Lehman algorithm for 
the graph isomorphism problem. The dimension is an important parameter of the 
algorithm. On the one hand, the higher dimension is chosen, the longer the algorithm 
runs. On the other hand, a small dimension may do not suffice to compute the right 
output. Our goal will be to show that, on input graphs of order n, the dimension 
L(n + 1)/2J suffices. Another job we do here is to compute an explicit constant in 
the Cai-Furer-Immerman bound (2). 

We begin with description of the algorithm. 

7.1 Definitions and notation 

Given an ordered /c-tuple of vertices u = (ui, . . . , Uk) G V(G) k , let s = s(u) be 
the number of distinct components in u and define a function F a : {1, . . . , k} — > 
{1, . . . , s} by F u (i) = | {tii, . . . , Ui}\. Furhtermore, let G a be the graph on the vertex 
set {1, . . . , s} with vertices a and b adjacent iff, for the smallest % and j such that 
F u (i) = a and F a (j) = b, and Uj are adjacent in G. The pair (F u , Gy) is an 
isomorphism type of u and will be denoted by [u]. 

If w G V(G) and i < k, we let u l,w denote the result of substituting w in place 
of Ui in u. 

A (vertex) coloring of a graph G is an arbitrary function 7 : V(G) — > C. We 
will say that C is the set of colors and that a vertex v G V(G) has color 7(f). For 
a color c G C, the set 7~ 1 (c) is its monochromatic class. A coloring 7' refines a 
coloring 7 if ^'{v) = ^'(u) implies 7(f) = j(u), that is, the partition of V(G) into 
7'-monochromatic classes refines the partition into 7-monochromatic classes. 

7.2 Description of the algorithm 

We distinguish two modes of the algorithm. In the canonization mode the algorithm 
takes as an input a graph G and is purported to output its canonic form W(G). The 
canonic form of a graph W is a graph function such that W(G) = W(G') iff G = G'. 
In the isomorphism testing mode the algorithm takes as an input two graphs G and 
G' and should decide if G = G' . 

We now describe the /c-dimensional algorithm. The algorithm assigns an ini- 
tial coloring to an input graph, then step by step refines it by iterating the color 
refinement procedure, and finally, when no color refinement is any more possible, 
terminates and computes an output. 

Initial coloring 

The algorithm assigns each u G V(G) k color Wq°(u) = [u] (in a suitable encoding). 
Color refinement step 
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In the r-th step each u G V(G) k is assigned color 

W k d\u) = (W^ r -\u), { {W k a r -\u l n, • • • , W k a r -\u k n) : «; G V(G)}) . 

In the proper Weisfeiler-Lehman algorithm the second component of W^'^iZ) is a 
multiset rather than a set. However, in what follows we assume that it is a set. It will 
be clear that the version of the algorithm we consider is weaker than the standard 
one, i.e., whenever our /c-dimensional version gives the right output, so does the 
standard A;-dimensional version (and it will be not hard to show that sometimes for 
the standard version a considerably smaller dimension is enough). This relaxation 
makes our result only stronger as any upper bound for the dimension of the weaker 
version is as well an upper bound for the dimension of the standard version. 

Proposition 7.1 If <fi is an isomorphism from G to G' , then for all k, r, and u G 

V(G) k we have W% r {u) = W k f((f> k (u)). 

Proposition 7.2 For every pair of graphs G and G' there is a number R such that 
for all u G V(G) k , v G V(G') k , and r > R 

W% T (u) = W k f(v) iff W k ' R (u) = W k ) R (v). 

Moreover, ifR k (G,G') denotes the smallest such R, then R k (G,G') < \G\ k + \G'\ k . 

Proof. By Proposition 7.1 it suffices to prove the claim for arbitrary isomorphic 
copies of G and G' and we therefore can suppose that V(G) and V(G') are disjoint. 
Colorings Wq t and W^f determine the partition of the union V(G) k U V(G') k into 
monochromatic classes. Denote this partition by IT. Since the (r + l)-th color 
incorporates the r-th color, IT +1 is a subpartition of IT. It is clear that we eventally 
have n^ +1 = 11^ and the smallest such R is less than \V(G)\ k + \V{G')\ k . - 

Computing an output 

Isomorphism testing mode. The algorithm terminates color refinement as soon as 
the partition IT of V(G) k U V(G') k coincides with IF -1 , i.e., after performing r = 
Rk(G, G') + 1 refinement steps. The algorithm decides that G = G' iff 

{W#V) : « e V(G)} = {W k f(v k ) : v G V(G')} , (20) 

where w k denotes the diagonal vector (wi, . . . , Wk) with all Wi = w. 

Canonization mode. The algorithm performs r = 2|C| fc — 1 refinement steps and 
outputs the set [w% r (u k ) : u G V(G)). 

Implementation details and complexity bounds. Denote the minimum length 
of the code of w£ r (u) over all u by A(r). As easily seen, for any natural encoding 
we should expect that A(r) > (k + l)A(r — 1). To prevent increasing A(r) at the 
exponential rate, before every refinement step we arrange colors of all fc-tuples in 
the lexicographic order and replace each color with its number. In the canonization 
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mode we should keep the substitution tables of all steps. In the isomorphism testing 
mode this is unnecessary but it should be stressed that color renaming must be 
common for both input graphs. The straightforward implementation of the algo- 
rithm takes time 0(k 2 n 2k log 2 n) and space 0(kn 2h \ogn), where n = \G\. In the 
isomorphism testing mode, when we do not waste memory by keeping substitution 
tables, the space 0(kn k (k + logn)) suffices. A better implementation with time 
bound 0(k 2 n k+1 logra) is suggested in [17]. 

7.3 Relation to the Ehrenfeucht game 

Given numbers r, /, and k < I, graphs G, G', and Axtuples u G V(G) k , v G V(G') k , 
we use notation EnR l r (G,u,G',v) to denote the r-round /-pebble Ehrenfeucht game 
on G and G' with initial configuration (u,v), i.e., the game starts with one copy of 
the pebble pi, i < k, placed on Ui and the other copy of pi placed on V{. 

Proposition 7.3 (Cai, Fiirer, and Immerman [8]) For all u G V(G) k and 
v G V(G') k the equality 

W k d r {u) = W k J{v) (21) 
holds iff Duplicator has a winning strategy in Ehr^ +1 (G, u, G', v). 

Proof. We proceed by induction on r. The base case r = is straightforward by 
the definitions of the initial coloring and the game. Assume that the proposition is 
true for r — 1 rounds. 

Assume (21) and consider the Ehrenfeucht game Ehr^ +1 (G, u, G', v) . First of all, 
the initial configuration is non-losing for Duplicator since (21) implies that [u] = [v]. 
Further, Duplicator can survive in the first round. Indeed, assume that Spoiler in 
this round selects a vertex a in one of the graphs, say in G. Then Duplicator selects a 
vertex b in the other graph, respectively in G', such that WQ r ~ 1 (u t ' a ) = WQ, r ~ 1 (v' l ' b ) 
for all i < k. In particular, [u t,a ] = [v l,b ] for all i < k. Along with [u] = [v], this 
implies that [u, a] = [v, b]. Assume now that in the second round Spoiler removes the 
j-th pebble, j < k. Then Duplicator's task in the rest of the game is essentially to 
win EnR k +l(G,u j ' a ,G',v j ' b ). Since WG ,r ~ 1 (u j ' a ) = W^' 1 Duplicator succeeds 
by the induction assumption. 

Assume now that (21) is false. It follows that Wg r ~\u) + W%,(v) (then 
Spoiler has a winning strategy by the induction assumption) or there is a vertex a 
in one of the graphs, say in G, such that for every b in the other graph, respectively 
in G', Wa' r ~\u jb ' a ) ^ W& r ~ 1 (u jb > b ) for some j b < k. In the latter case Spoiler in his 
first move places the (fc+l)-th pebble on a. Let b be the vertex selected in response by 
Duplicator. In the second move Spoiler will remove the j&-th pebble, which implies 
that since the second round the players essentially play Ehr^J(G, u^ b,a ) G', v^ b ' b ). 
By the induction assumption, Spoiler wins. ■ 

If G = G', then the Weisfeiler-Lehman algorithm recognizes G and G' as isomor- 
phic for every dimension k. This follows from Proposition 7.1. If G ^ G', then the 
algorithm may be wrong if k is chosen too small. 
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Corollary 7.4 If G and G' are non-isomorphic graphs of the same order n, then the 
k-dimensional algorithm recognizes G and G' as non-isomorphic iff k > L(G, G') — 1. 

Proof. Duplicator has a winning strategy in Ehr^ +1 (G, G') iff for every a £ V(G) 
(resp. b G V{G')) there is b e V(G') (resp. a e V(G)) such that Duplicator has 
a winning strategy in Ehr^(G, a, G', b) or, equivalently, in Ehr^(G, a k , G', b k ). 
It follows by Propositions 7.2 and 7.3 that the /c-dimensional algorithm decides 
that G = G' iff Duplicator has a winning strategy in Ehr^ +1 (G, G') for all r. By 
Proposition 2.11, L(G,G') — 1 is equal to the maximum I such that Duplicator 
has a winning strategy in Ehr^,(G, G') for every r. Therefore the decision of the 
/c-dimensional algorithm is correct iff k > L(G, G') — 1. ■ 

7.4 An upper bound on the dimension of the algorithm 

Definition 7.5 The smallest dimension of the Weisfeiler-Lehman algorithm giving 
the right output on graphs G and G' of order n in the isomorphism testing mode 
will be referred to as the optimum dimension of the algorithm on G and G' and 
denoted by WL(G, G'). Furthermore, the optimum dimension of the algorithm on 
graphs of order n is defined by 

WL(ra) = max{WL(G,G") : G'^G, \G'\ = \G\ = n} . 

It is easy to see that the smallest dimension of the algorithm giving the right output 
on an input graph G in the canonization mode is equal to 

max{WL(G',G") : G' ¥ G, \G'\ = \G\} 

and hence in the worst case is equal to WL(n). 

On the account of Corollary 7.4 and Theorem 3.15, we immediately obtain the 
following result. 

Theorem 7.6 WL(ra) < (n + l)/2. 

This bound is almost tight for the relaxed version of the algorithm that we 
have dealt with in this section. However, Theorem 7.6 leaves a considerable gap if 
compared with the Cai-Furer-Immerman lower bound for WL(n), that is discussed 
in the next subsection. Thus, what limsup^^ WL(n)/n is remains open. 

7.5 The lower bound for the dimension — computing an 
explicit constant 

Cai, Fiirer, and Immerman [8] prove a striking linear lower bound WL(n) > cn, 
without specification of a positive constant c. We are curious to draw from their 
proof an explicit value of c. We do not give a complete overview of the proof focusing 
only on a few most relevant points. The following notion will be fairly useful. 
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Definition 7.7 Let if be a graph of order n. Given X C V(H), denote H \ X — 
H[V(H) \ X}. We call a set X a separator of H if every connected component of 
the graph H\X has at most n/2 vertices. The number of vertives in a separator is 
called its size. The minimum size of a separator of H is denoted by s(H). 

Cai, Fiirer, and Immerman present a construction of non-isomorphic graphs G 
and G' of the same order with large WL(G, G'). Both G and G' are constructed 
from a suitable connected graph H with minimum vertex degree at least 2. We 
will assume that H is rf-regular, that is, every its vertex has degree d (using H 
not regular seems to give us no gain). Below we summarize the properties of the 
construction. Recall that A(G) denotes the maximum vertex degree of a graph G. 

Proposition 7.8 (Cai, Fiirer, and Immerman [8]) Let d > 2 and H be a 

connected d-redular graph. There are transformations of H to two graphs G = G(H) 
and G' = G'(H) such that 

• If d > 3, both G and G' are connected; 

• |G| = \Q'\ = {d + 2 d - r )\H\; 

• A(G) = A(G") = 2 d " 1 ; 

• G ¥ G'; 

• WL(G,G") > s(H). 

Thus, we need a family of rf-regular graphs H with d constant and s(H) linearly re- 
lated to the order of H. The authors of [8] suggest using graphs with good expansion 
properties. 



Definition 7.9 Let if be a graph of order n. The vertex-expansion of H is denoted 
by i v (H) and defined by 

< B (ff) = min | : AcV(H), \A\ < || , 

where N(A) = {JveA F(v) \ A is the neiborhood of a set A. 
Lemma 7.10 For a graph H of order n we have 

Proof. Let X be a separator of H with the smallest size s = s(H). Denote the 
largest size of a connected component of H \ X by m and recall that m < n/2. 
There is a set A x C V(G) \ X with \A±\ = m such that if^] is a connected 
component of H \ X and, as it is not hard to see, there is a set A2 C V(G) \ X 
with — y < \A 2 \ < ^ such that if[A 2 ] is a union of connected components of 
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H\X. Note that max{m, ^ — y} > ^ and therefore for A h one of the sets A 1 
and A 2 , we have 

3 ~ 1 1 ~ 2 
By the definition of the vertex expansion, 

n — s 

Since A { is a union of connected components of H \ X, we have N(Aj) C X and 
hence |7V(Aj)| < s. Thus, we obtain the relation 

s > i v (H) 



\N(Ai)\ >i v (H)\Ai\ >i v (H) 



3 

Resolving it with respect to s, we arrive at the reqiured estimate. ■ 

Thus, we need ci-regular graphs with large vertex-expansion. The best examples 
we could find in the literature come from the known edge-expansion results. 

Definition 7.11 Let if be a graph of order n. The edge- expansion (or the isoperi- 
metric number) of H is denoted by i e (H) and defined by 

where e(A, B) denotes the number of edges in H with one end vertex in A and the 
other in B. 

For a ci-regular graph H it is straightforward that i v (H) > i e (H)/d. We are able 
to improve this relation for 3-regular (or cubic) graphs. 

Lemma 7.12 If H is a cubic graph, then i v (H) > i e (H)/2. 

Proof. If H is disconnected, then one of its connected components occupies no more 
than a half of the vertices and hence i v {H) = 0. Suppose that H is connected. 

Of all A C V(H) with \ A\ < \H\/2 and |iV(A)|/|A| = i v (H), take one minimizing 
e(A, N(A)). Let us show that every vertex x G N(A) sends at most 2 edges to A. 
This will give us the desired relation because in this case 

e(A,N(A)) 2\N(A)\ _ 
*e(H) < ^ < - 2t v (H). 

Suppose, to the contrary, that some x G N(A) sends 3 edges to A. 

Consider an arbitrary y G A. Let A y — (A \ {y}) U {x}. If y sends 3 edges 
to N(A) \ {x}, then N(A y ) = N(A) \ {x} has less vertices than N(A) has while 
\A y \ = \A\. Therefore 1-^(^)1 /|A y | < i v (H), a contradiction. If y sends 1 or 2 edges 
to N(A) \ {x}, then N(A y ) C (N(A) \ {x}) U {y}. It follows that \N(A y )\ < \N(A)\ 
and it should be 1^(^)1/1^1 = i v (H). However, e(A y ,N(A y )) < e(A,N(A)) - 1, 
contradicting the choice of A. 

We conclude that every y G A sends no edges to N(A) \ {x}. It follows that A U 
{x} spans a proper connected component of H, a contradiction to the connectivity 

of H. m 
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Using Lemmas 7.10 and 7.12, from Proposition 7.8 we easily obtain the following 
consequence. 



Proposition 7.13 Let i e (3, m) denote the maximum edge- expansion of a connected 
cubic graph of order m. Then there are non- isomorphic graphs G and G' both of 
order 7m with maximum degree 4 such that 

wl(g,go> m. ■ 

6 + i e (3, m) 

It seems that the best known lower bounds for i e (3, m) are obtained by examining 
random cubic graphs. The edge-expansion of a random cubic graph was studied by 
Buser [7], Bollobas [6], and others with the best lower bound as follows. 

Proposition 7.14 (Kostochka and Melnikov [19, 20]) Let H be a random 
cubic graph of order m. If m is sufficiently large, then with probability 1 — o(l) we 
have 

It follows that i e (3,m) > where m is supposed to be large enough. For the 
graphs G and G' as in Proposition 7.13 we therefore have 

72 

WL < G - G ')> 7(1 +6 , 4 95) > 0.00465 n, 
where n = 7m is the order of the graphs. Thus, the constant in question is evaluated. 
Proposition 7.15 WL(n) > 0.00465 n for infinitely many n. ■ 

Notice that, with high probability, a random cubic graph is connected. The 
construction of [8] together with the logical characterization of WL(G,G") given in 
the same paper, has therefore the following consequence worthwhile to note. 

Proposition 7.16 For infinitely many n, there are non-isomorphic connected graphs 
G and G' both of order n with maximum degree 4 such that D(G, G') > 0.00465 n. ■ 



8 Digraphs and binary structures 

We now extend the results of Sections 3 and 4 over directed graphs and, more 
generally, structures with relation arity at most 2. 
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8.1 Definitions 



In logical terms, a directed graph (or digraph) G is an arbitrary binary relation E 
on a vertex set V(G). The edge set of G is ={(«,«) G l / (G) 2 : £(a, 6) = 1}. 

Thus, between two distinct vertices w and t> we allow two opposite edges (w, v) and 
(f,w), or only one of them, or none. We view an edge (u,v) as an arrow from u 
to v. An edge (v,v), called a loop, is also allowed. From now on the stand-alone 
term graph means ordinary undirected graph. Recall that the undirected graphs are 
actually considered a subclass of the directed graphs wherein an undirected edge 
{u, v} corresponds to two directed edges (u,v) and (v,u) and we have no loops and 
no two vertices with exactly one directed edge between them. 

A (vertex) colored digraph is a structure that, in addition to the binary relation, 
has unary relations Ui, . . . ,U m . The truth of Ui(v) for a vertex v is interpreted 
as coloration of v in color %. Thus, a vertex can have several colors or no color. 
However, colored digraphs can be modelled as digraphs with each vertex having 
exactly one color by defining new colors as conjunctions of some of U±, . . . , U m and 
yet another new color for uncolored vertices. 

Convention. Observe that digraphs can be modelled as colored loopless di- 
graphs by assigning a special color to the vertices with loops. To facilitate the 
exposition, we will use this observation and assume throughout this section all di- 
graphs loopless. 

A complete digraph has two directed edges (u, v) and (v, u) for every pair of 
distinct vertices u and v. An empty digraph has no edges at all. Let G be a digraph 
and X, Y C V(G) be disjoint. Similarly to graphs, G[X] denotes the subdigraph 
induced by G on X and G[X, Y] denotes the bipartite subdigraph induced on the 
vertex classes X and Y. A set X is called complete (resp. independent) if G[X\ 
is complete (resp. empty). X is called homogeneous if it is complete or empty. A 
bipartite subdigraph G[X, Y] is complete, independent, or (X, Y)-complete if for any 
u G X and v G Y we have respectively (u,v), (v,u) G E{G); (u,v), (v,u) G" E(G); 
(u,v) G E{G) but (v,u) G" E{G). It is dicomplete if it is either (X, F)-complete 
or (Y, X)-complete. A pair X,Y is called homogeneous if G[X, Y] is complete, 
independent, or dicomplete. 

A binary vocabulary has relation arities at most 2. A binary structure is a struc- 
ture over a binary vocabulary. Combinatorially, a binary structure (U±, . . . , U s , Ei, 
. . . , E t ) with s unary and t binary relations can be viewed as a complete digraph 
that is vertex colored in colors 1, . . . , s and edge colored in colors 1, . . . , t: A vertex 
v has color % < s iff Ui(v) = 1 and an edge (u,v) has color j < t iff Ej(u,v) = 1. 
An isomorphism between such complete digraphs should preserve the sets of colors 
of each vertex and of each edge. Given a binary structure (U\, . . . ,U S , Ei, . . . , E t ), 
we will often call elements of its universe vertices and consider each Ei a digraph. 

Let G = {U\, . . . , U s , Ei, ... , E t ) be a binary structure with universe V{G). A 
set X C V(G) is monochromatic if U(u) = U(v) for all u,v G X and every i < s. 
We call X homogeneous if it is monochromatic and homogeneous in every digraph 
Ej, j < t. We call a pair X, Y C V(G) of disjoint sets homogeneous if it is such in 
every digraph Ej, j < t. 
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8.2 Distinguishing binary structures 

We now generalize Theorem 3.15 from graphs to binary structures. The same proof 
in essence goes through with only a few substantiate modifications; We indicate 
these by tracing Section 3. By G we will mean, unless stated otherwise, a binary 
structure (Ui, . . . , U s , Ei, . . . , E t ). 

Definition 3.1, as well as the other definitions in Section 3.1, makes a perfect 
sense for an arbitrary structure. Let us look what the similarity of vertices u and 
v means in a digraph. We say that a vertex t separates u and v if precisely one of 
(t,u) and (t, v) is in E(G) or precisely one of (u,t) and (v,t) is in E(G). Then u 
and v are similar if no third vertex separates them and if (u, v) and (v, u) both are 
in E{G) or both are not 2 . In a general binary structure, u and v are similar if they 
have the same sets of colors (i.e., satisfy the same unary predicates) and are similar 
in each digraph Ei. 

Claim 8.0.1. Lemma 3.2 holds true for digraphs and, more generally, for binary 
structures. 

Proof of Claim. The only not completely obvious part of the proof is verification 
that ~ is a transitive relation on the vertex set of a digraph. Suppose that u ~ v 
and v ~ w for pairwise distinct u, v, and w. Assume, for example, that (u, v) and 
(v,u) are both present in E(G). As v ~ w, we have (u,w), (w,u) G E(G). Now, 
as u ~ v, we have (v,w), (w,v) G E(G). Also the sets of in- and out-neighbors of 
u and w in V(G) \ {u, v,w} are identical (being equal to those of v). This implies 
that u ~ w, proving the transitivity. □ 

Claim 8.0.2. Lemma 3.6 holds true for digraphs and, more generally, for binary 
structures with one stipulation in Item 1. Specifically, let X C V(G) be C-maximal. 
Then the partition C(X) has the following properties. 

1) Every C in C(X) is a homogeneous set provided |C| ^ 2. 

2) If C\ and C2 are distinct classes in C(X) and have at least two elements each, 
then the pair C±, C2 is homogeneous. 

Proof of Claim. The claim easily reduces to its particular case that G is a digraph. 
So we assume the latter. 

1) Suppose, to the contrary, that C is not homogeneous. 

First, suppose there are u,v G C with (u,v) G E(G) and (v,u) G" E(G). Let 
w G C \ {u,v}. As moving v to X cannot separate u and w (for otherwise |C(X)| 
increases), we conclude that (w,v) G E{G) and (v,w) G" E{G). Pondering w as the 
next move we conclude that (w,u) G E(G) and (u,w) G" E(G). But u separates v 
and w, a contradiction. 

Thus we can assume that C contains three vertices u, v, and w such that 
(u, v), (v, u) G E(G) but (v , w), (w, v) G" E(G). However, if we move v to X, then C 
splits into two classes, which are non-empty as they contain u and w respectively. 
Hence |C(A)| increases at least by 1, a contradiction. 

2 and if u and v simultaneously make loops or do not (see Convention in Section 8.1). 
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2) Let u,v G C\ and w,x G C 2 . The contemplation of w for the next move 
shows that (u,w) and (v,w) are both present or absent; considering v we conclude 
the same about (v,x) and (v,w). Hence, (u,w) is an edge if and only if (v,x) is. 
Similarly, the same is true about (w,u) and (x,v). As u,v,w,x are arbitrary, the 
claim follows. □ 

Lemma 3.7 carries over with literally the same proof. 

Definition 3.12 makes a perfect sense for binary structures. We now state an 
analog of Lemma 3.13. 

Lemma 8.1 Suppose that G and G' are non-isomorphic structures over the same 
vocabulary with maximum relation arity 2. If G and G' have orders n and n' 
respectively and n < n', then 

Di(G, G') < (n + 5)/2 (22) 
unless G' = G © (n' — n)v for some v G V(G). 

Proof. We go through the lines of the proof of Lemma 3.13. Phase 1 is played with 
no changes. The same strategy for Phase 2 ensures that, unless Spoiler wins in 2 
extra moves, we have a partial isomorphism 0* : X U Y — > X' U Y' from G to G' . 
Exactly as in the case of graphs, in Phase 2 Duplicator should obey 0* if Spoiler 
moves in YUY' and the 0*-similarity if Spoiler moves in ZUZ'. In particular, Claim 
3.13.3 carries over literally and we again have the 0*-similarity of the classes Di and 
D[ for all % < p. Claim 3.13.4 needs a careful revision, after which it is provable 
with minor changes. 

Claim 8.1.1. Unless Spoiler is able to win making 2 moves and at most 1 alternation 
in Phase 2, the following conditions are met for every i < p, for every pair of distinct 
i,j < P, and for each k <t. 

1) Both Di and D\ are monochromatic and, moreover, the sets of their colors 
coincide. 

2) If \Di\ ^ 2, then the set Di is complete or independent in the digraph Ek- 
Moreover, irrespectively of \Di\, the set Di is complete (resp. independent) in 
E k iff so is D[ in E' k . 

3) The pairs D i: Dj and D[,D'j are simultaneously complete, independent, or 
dicomplete in E k and E' k respectively. Moreover, in case of dicompleteness we 
have the same direction of edges. □ 

Assume that the conditions in Claim 8.1.1 are obeyed. We now claim that G 
and G' are isomorphic iff |A| = f° r all i < P- We still cannot substantiate this, 
as it was done in the case of graphs, because of the stipulation that \D^\ ^ 2 made 
in Item 2 of Claim 8.1.1. The next claim gives the missing part of the argument. 

Claim 8.1.2. Unless Spoiler is able to win making 2 moves and at most 1 alternation 
in Phase 2, the following is true for every i < p. Assume that \Di\ = 2 and, for 
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some k < t, the digraph E k [Di] has exactly one directed edge. Then \D[\ = 2 and 
0* extends to an isomorphism from G[X U7UDJ to G'[X' UF'U D§. 

Proof of Claim. Let C be the class in C(X) including Di and C be the class in 
C(X') including D[. Claim 8.0.2 implies that C = D t . 

Let us prove that \D[\ = 2. Note first that D[ cannot consists of a single element. 
Indeed, since \C'\ > 2, this would mean that C is split up into at least two D'-classes. 
However, at most one of them can have 0*-similar counterpart, contradicting the 
assumption that the 0*-similarity is a perfect matching between T>(X) and V(X'). 

Let us now show that D[ cannot have more than 2 elements. Suppose, to the 
contrary, that \D' i \ > 2. Let = {u,v} with (u,v) G E k and (v,u) ^ E^. If 
-E^C] contains two vertices with both or no directed edges between them, Spoiler 
selects these two and wins because Duplicator is enforced to reply with u and v 
by the analog of Claim 3.13.1. Assume that between any two vertices of E' k [C] 
there is exactly one directed edge. It is easy to see that for some a,b,c G C we have 
(a, b), (b, c) G E' k . Let Spoiler select b first. By the analog of Claim 3.13.1, Duplicator 
must reply with tiorw. In the next move Spoiler wins with a if Duplicator replies 
with u, and wins with c otherwise. 

To prove that 0* extends as desired, let Spoiler select the two vertices of D[ C C 
By the analog of Claim 3.13.1, Duplicator must reply with the vertices of Di = C. 
It is clear that either Spoiler wins or 0* does extend. □ 

Since G and G' are supposed non-isomorphic, there is Di such that ^ \D[\. 
As in the case of graphs, we call such a Di useful. Note that, by Item 2 of Claim 
8.1.1 and Claim 8.1.2, every useful is homogeneous and, by Claim 8.1.1, D[ is 
also homogeneous coherently with D; L over all unary and binary relations. 

The rest of the proof carries over without any changes. ■ 

The main result of this section is provable similarly to Theorem 3.15 virtually with 
no change. 

Theorem 8.2 Let G and G' be structures over the same vocabulary with maximum 
relation arity 2. If G and G' are non-isomorphic and have the same order n, then 
Di(G,G") < (n + 3)/2. 

8.3 Defining binary structures 

Theorem 4.6 generalizes to binary structures with minor changes in the proof as 
follows. Lemmas 4.1 and 4.2 hold true for binary structures literally, see Remark 
4.9 where k = 2 makes no change. As a consequence, by Lemma 8.1, Lemma 4.3 
literally holds as well. Lemma 4.4, where the homogeneousity is redefined in Section 
8.1, also holds literally with the same in essence, easily adapted proof. Note that 
Definition 4.5 makes a perfect sense for the generalized homogeneousity. Finally, 
Theorem 4.6 holds true with literally the same proof. 
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9 Uniform hypergraphs 



We here extend the results of Sections 3 and 4 to uniform hypergraphs. 
9.1 Definitions 

A k-uniform hypergraph (or k-graph) G on vertex set V(G) is a family of /c-element 
subsets of V(G) called (hyper) edges. As usually, the set of edges of G is denoted by 
E(G). The /c-graphs generalize the notion of an ordinary graph, which is actually a 
2-graph. From the logical point of view, a /c-graph is a structure with a single /c-ary 
relation E which is symmetric in the sense that E(xi, . . . , Xk) = E(x n (i), . . . , x n ^)) 
for any permutation 7r of the /c-element index set and anti-reflexive in the sense that 
E(xi, . . . , Xk) = whenever at least two of the x^s coincide. Thus, we will sometimes 
write E(xi, . . . , Xk) = 1 to say that {x±, . . . , xt} G E(G) and E(x±, . . . , Xk) = to 
say that {x 1: . . . , x k } ^ E(G). 

Most graph-theoretic notions and all notions introduced for general structures 
directly extend to /c-graphs. For example, the complete A;-graph of order n has n 
vertices and all possible (j^j edges. The empty /c-graph of order n has n vertices and 
no edges. If X C V(G), then G[X] denotes the /c-graph induced by G on X, that 
is, the /c-graph with the vertices in X and with all those edges A G E(G) for which 
A CX. 

Let < b < k — 1 and a = k — b. Given a b- vertex set B C V^(G), we define the 
link-graph Gb of B to be the a-graph on the vertex set V(G) \ B with the edge set 



Clearly, G % = G. 

9.2 Distinguishing /c-graphs 

Theorem 9.1 Let k > 2. If G and G' are non-isomorphic k-graphs both of order 
n, then 



If k > 2, we do not know if the bound of the theorem is tight since the only lower 
bound we know for any k is (n + l)/2. The latter is given by the same Example 
2.13 as for 2-graphs, where K m now means the complete /c-graph and K m means the 
empty /c-graph of order m. 

The proof of Theorem 9.1 takes the rest of the subsection. It will be built on the 
framework worked out in Section 3. However, note that Spoiler's strategy that will 
be designed in the proof of the key Lemma 9.7 will be somewhat different. 

As it was already mentioned, Definition 3.1 of the similarity relation ~ makes a 
perfect sense for an arbitrary class of structures, in particular, for /c-graphs. Recall 
that u ~ v for vertices u and v of a /c-graph G if the transposition (uv) is an 
automorphism of G. Both items of Lemma 3.2 hold true for /c-graphs but Item 2 



E(G B ) = {A: \A\ = a, A U B G E(G)} . 




42 



should be supplemented with more information specific for hypergraphs, see Item 4 
of the following lemma. 

Lemma 9.2 Let G be a k-graph. 

1) Given a k-element U C V(G) and v,w G V(G), let U (vw) denote the result of 
substituting v in place of w and vise versa in U. Assume that v ~ w. Then 
U G E{G) iffUW G E(G). 

2) ~ is an equivalence relation on V(G). 

3) If v.i ~ Wi for all 1 < i < k, then E(vi, . . . , Vk) = E(wi, . . . , Wk). 

4) Let v G V(G) and [v]q denote the similarity class containing v . Let B C V(G) 
be disjoint with [v]q and have at most k — 1 vertices. Then the graph Gb [[v]g] 
is either complete or empty. In particular, G [[v]g] is either complete or empty. 

5) Let H be another k-graph, X C V(H), and G = H[X\. Then [v] H (lX C [v] G 
for any v G X . In other words, the partition of X into the H -similarity classes 
refines the partition into G-similarity classes. 

Proof. 1) This item is straightforward from the definition of the ~-relation. 

2) The only not completely obvious task is, given pairwise distinct vertices u, v, 
and w, to infer from u ~ v and v ~ w that u ~ w. For any (k — l)-element set of 
vertices {x±, . . . , Xk-i} we have to check that 

E(u, xi,..., Xk-i) = E(w, X!,..., Xk-i). (23) 

If Xi ^ v for any i, this is easy. Assume that x k ^i = v. Then the relations u ~ v 
and v ~ w imply by Item 1 that both the left hand side and the right hand side of 
(23) are equal to E(u, w, x±, . . . , Xk-2)- 

3) Let t denote the number of common elements in {v\, . . . , Vk} and {wi, . . . , Wk}. 
We proceed by the reverse induction on t. If t = k, the claim is trivial. Otherwise 
assume that {vi, . . . , Vk}. We have E(v±, . . . , v », . . . , Vk) = E(vi, . . . , . . . , v k ) 
by Item 1 and E(vi, . . . ,Wi, . . . , v k ) = E(wi, . . . , w^, . . . , Wk) by the induction hy- 
pothesis. 

4) This item follows directly from Item 3. 

5) Let v,w G X and assume that v ~ w in H. As easily follows from Item 1, 
t)~!«as well in G. ■ 

We will use all the definitions and notation introduced in Section 3.1. In partic- 
ular, given X C V(G), we will deal with the relation = x on X, the partition C(X) 
of X, the sets Y(X) and Z(X), and the partition V{X) of Z(X). In addition, we 
need a new relation and a new important parameter. 

Definition 9.3 The relation ~ x coincides with the equality on XUY(X) and with 
the relation = X uy(x) on Z(X). 
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Definition 9.4 r(X) = \Y(X)\ + \V(X)\. 

We now generalize Definition 3.12 over fc-graphs. This needs more care because 
the operation © for /c-graphs is somewhat more subtle than for graphs. 

Definition 9.5 Let G and H be /c-graphs, v G V(G), and I > 0. The notation 
H = G ®lv means that the following conditions are fulfilled. 

Al (T G (v) > k; 

A2 \H\ = \G\+l and V{G) C V(if); 

A3 H[V{G)\ = G; 

A4 [v] H = [v] G U(V(H)\V(G)). 

Furthermore, we write H — G © Iv if there is a /c-graph such that H = K and 
X = G © Zu. 

Let us see carefully what the relation H = G © Iv means for /c-graphs. As easily 
seen, H is obtained from G by adding a set A of Z new vertices and some new edges 
involving at least one vertex from A. Given a k-vertex set U C V(if) having non- 
empty intersection with A, we have to decide whether or not U is in E(H). The 
criterion is given by Item 4 of Lemma 9.2. Specifically, assume that the intersection 
B = U n (V(G) \ [v] G ) contains b vertices. Let W C [u] G with |W| = A; - b. Then 
Z7 G -E(-Z^) iS BUW G -E(G) (the latter is equally true or false for any choice of W). 
This provides us with the complete description of H. Thus, on the account of Item 
4 of Lemma 9.2, we arrive at the conclusion that H = G © Iv , if exists, is unique up 
to an isomorphism. It remains to prove that, if H is constructed as described above, 
then indeed H = G © Iv . This immediately follows from the following lemma. 

Lemma 9.6 Let G and H be k-graphs, v G V(G), and I > 0. 

1) H = G © Iv iff the following two conditions are fulfilled. 

Bl \H\ = \G\ + l and V(G) C V(H); 

B2 There is D C [v] G with \D\ > k such that the following is true: Every injection 
ip : V(G) — > V(H) whose restriction to V(G) \ D is the identity map is a 
partial isomorphism from G to H. 

2) H = G Q) Iv iff the following two conditions are fulfilled. 
CI \H\ = \G\+l; 

C2 There is D C [v] G with \D\ > k such that the following is true: There exists an 
injection ipo : V(G) \ D — > V(H) whose every injective extension if) : V(G) — > 
V(H) is a partial isomorphism from G to H. 
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Proof. 1) Let us show first that Conditions A1-A4 imply Conditions B1-B2. Since 
Bl coincides with A2, we focus on B2. Let D C [v] G be an arbitrary set with 
\D\ > k, existing by Al. Assume that ip : V(G) — > V(H) is an injection whose 
restriction to V(G) \ D is the identity map. Consider an arbitrary /c-element set 
U C V(G) and suppose that U = {u±, . . . , u a , v±, . . . , Vb}, where each Uj G V(G) \ D 
and each V; L G D. Then tp(U) = {ui, . . . ,u a ,Wi, . . . ,W(,}, where Wi = ipivi). Note 
that each w { e D U (V(H) \ V(G)). By A4 we have v t ~ Wi in H for all i < b. By 
Item 3 of Lemma 9.2, ^(U) G E(H) iff U G By A3, the latter is equivalent 

with U G E(G), proving that ip is indeed a partial isomorphism from G to iJ. 

We now show that Conditions B1-B2 imply Conditions A1-A4. For Al and 
A2 this is trivial. Considering ip being the identity map of V(G) onto itself, we 
immediately obtain A3. To obtain A4, it suffices to choose an arbitrary v G [v] G U 
(V(H) \ V(G)) and prove that, for every v' G [v] G U (V(H) \ V(G)), the vertices v 
and v' are similar in H. This will give [v]g U (V(H) \ V(G)) C [v]h- The converse 
inclusion is given by Item 5 of Lemma 9.2. 

Choose vq G D. According to Item 1 of Lemma 9.2, we have to show, for any 
A;- vertex set U C V(H), that U G E(H) iff G If both v and u' or 

none of them are in U, this is obvious. Otherwise it is enough to consider the case 
that vq is in U but v' is not. Suppose that 



where all m G V(G) \ D, all Vi G D, and all Wi G \ 

Since \D\ > k, we can choose in D \ {v , v i, . . . , Vb} pairwise distinct vertices 
v[,...,v' c . Define ipi : V(G) — > V(H) so that V'iW) = ^ f° r each i < c and 
■01 (x) = x for all other x G V(Gr). If f' G [u]g \ {v[, . . . ,v' c }, let ip2 be the same 
as Notice that in this case ^(U^) = (ipi 1 (U)Y V ° V '^ and v ~ u' in G. 
If f' G {v[, . . . ,v' c } U (V(if) \ V(G)), let ^ 2 coincide with -^i everywhere but v , 
where we set ^2(^0) = v'. Notice that now if)^ 1 \U^ V ° V " ) ) = ^^(U). In both of the 
cases we have ^ 1 {U) G E(G) iff ^(U^ ^) G E(G). By B2, Vi and ^2 are partial 
isomorphisms from G to H. It follows that [/ G E(H) iff f/( 1 ' 01 ' ) £ E(H), completing 
derivation of A4. 

2) Suppose that H = K and K = G @lv. Then Item 2 easily follows from Item 
1 applied for /c-graphs G and K. In particular, if we have Condition B2 with a set 
D, then a map ipo in Condition C2 can be taken the restriction of an isomorphism 
from K to H to the set \ D. • 

Lemma 9.7 Let k > 2. If G and G" are non-isomorphic k-graphs of orders n and 
n' respectively and n < n', then 



unless G' = G © (n' — n)v for some v G V(G). 

Proof. We will describe a strategy of Spoiler winning EHR r (G, G') for r = [(1 — 
l/k)n + 2k — lj unless G' = G © (n' — The strategy splits the game in two 
phases. 



U = {u u ...,u a ,v ,v 1 ,...,v b ,w 1 ,..., w c }, 




(24) 
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Phase 1 

Spoiler's aim is to select in G a set of vertices X with some useful properties. He 
proceeds as follows. Initially, X = 0. If there isBCX with at most k — 1 element 
such that t(X UB) > t(X), Spoiler chooses a such B arbitrarily, selects all vertices 
in B, and resets X to X U B. As soon as there is no such B, Phase 1 ends. 

Suppose that Phase 1 has now ended, during which Spoiler made s moves. The 
set X = {x±, . . . ,x s } is from now on fixed and consists of the vertices selected by 
Spoiler during Phase 1, where Xi is selected in the i-th round. It is easy to see that 

t(X) - 1 > (25) 

Since s + t(X) < n, we have 

s< (l-I)( n -l). (26) 

We will refer to the sets Y(X) and Z(X), and to the relation ~x omitting the 
subscript X. 

Claim 9.7.1. Let U = {u±, . . . ,Uk} and W = {wi, . . . ,Wk} be /c-element subsets of 
V(G) and m ~ w t for all 1 < i < k. Then U G E(G) iff W e E(G). 

In particular, if u ~ w, then u ~ w for any u,w & V(G), i.e., ^(-X") coincides 
with the partition of Z into the similarity classes. 

Proof of Claim. Notice that the claim easily follows from its particular case that 
Ui — Wi for i < k — 1. We hence assume this. 

Suppose on the contrary that U G E{G) but W ^ E(G). Let us modify X 
in k — 1 steps as follows. We will denote the result of modification after the i-th 
step by Xi. Initially X = X. We set X t = X^ U {«;} if m X^ U Y(X i _ 1 ) 
and Xi = Xi_i otherwise. We eventually enforce ^x fc _ 1 ur(x fc _ 1 ) w k- Since in 
the i-th step no single-element ^x^-class disappears, we have r{X k _ 1 ) > t(X) 
contradicting the assumption that Phase 1 has already ended. □ 

Let x\ denote the vertex of G' selected by Duplicator in the i-th round of Phase 
1 and X' = {x[, . . . ,x' s }. Assume that Duplicator has still not lost. Thus, the map 
: X — > X' given by <f)(xi) = x\ for i < s is a partial isomorphism from G to 
G'. Similarly to Claim 3.13.1 we see that Duplicator now plays under the following 
constraint. 

Claim 9.7.2. Whenever after Phase 1 Spoiler selects a vertex v G V(G) U V(G'), 
Duplicator responds with a 0-similar vertex or otherwise immediately loses. □ 

Phase 2 

Similarly to the proof of Lemma 3.13, we conclude from Claim 9.7.2 that the 
0-similarity determines the perfect matching between the singletons in C(X) and the 
singletons in C(X') unless Spoiler wins making 2 moves and at most 1 alternation 
in Phase 2. Denote the classes of C(X) by Ci, . . . , C t and the classes of C(X') by 
C[,...,C' t ,. We therefore will assume that, for some q < t, |Cj| = 1 iff i < q, \C[\ = 1 
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iff i < q, and Cj=^C- for all i < q. Let Y' = Y(X') and 0* : Y — > F' be an extension 
of that maps Cj onto C[ for every « < q. 

Claim 9.7.3. <p* is a partial isomorphism from G to G', unless Spoiler wins in the 
next k moves with no alternation, having made at total s + k < (1 — l/k)n + k 
moves. 

Proof of Claim. Assume that there is a /c-vertex set U CluF such that exactly 
one of the sets U and U' = <p*{U) is an edge of G or G' respectively. Let Spoiler 
select all vertices of U . If Duplicator responds with the vertices of U', he obviously 
loses. If Duplicator moves at least once outside U', he violates the ^-similarity and 
loses by Claim 9.7.2. □ 

Assume that Duplicator is lucky to ensure that <fi* is a partial isomorphism from 
G to G'. Let Z = Z{X) and Z' = Z(X'). 

Claim 9.7.4- Whenever in Phase 2 Spoiler selects a vertex v G Z U Z', Duplicator 
responds with a 0*-similar vertex or otherwise loses in at most k — 1 next rounds 
with no alternation between G and G' in these rounds. 

Proof of Claim. Let u be the vertex selected by Duplicator in response to v and 
assume that u v. Suppose that v G Z' (the case of v G Z is completely similar). 
If u ^ Z, Duplicator has already lost. If u G Z' ', there exists a (A; — l)-vertex set 
W C XUF such that iyu{M} is an edge but 0*(W)U{t> } is not or vice versa. Spoiler 
selects the so far unselected vertices of <f>*(W). Duplicator, who either responds with 
vertices in W or violates the 0-similarity, loses. □ 

Claim 9.7.4 readily implies that every class in T>(X) or T>(X') has a 0*-similar 
counterpart in, respectively, V(X') or T>(X) unless Spoiler wins making in Phase 2 at 
most k moves and at most one alternation between the graphs (selecting a vertex in 
a P-class without 0*-similar counterpart and applying the strategy of Claim 9.7.4). 
We will therefore assume that V(X) = {D t , D p }, V(X') = {D[, D' p }, and 
Di =0. D\ for all % < p. 

Claim 9.7.5. Unless Spoiler is able to win making 2k — 1 moves and at most 1 
alternation in Phase 2, the following condition is met: 

(*) For any U = {u±, . . . , a A;-vertex subset of V(G), and U' = {u[, . . . , u' k }, a 
k- vertex subset of V(G'), such that Ui =</,* v! i for all i < k, we have U G E(G) 
iff U' G E(G'). 

Proof of Claim. Suppose, to the contrary, that there are such U and U' but exactly 
one of U and U' is an edge. Spoiler selects the vertices in U' . Let U" = {u'(, . . . , u'^.} 
be the set of the respective responses of Duplicator. If u'( u[ for some % < k, 
Spoiler wins in the next k — 1 moves according to Claim 9.7.4. Assume that u"=^.uj 
for all i < k. Together with Ui u[ for all i, this implies Ui ~ u" for all i. Using 
Claim 9.7.1, we conclude that U" is an edge iff U is and iff U' is not. Thus, Duplicator 
loses anyway. □ 

In the sequel we suppose that the condition (*) in Claim 9.7.5 holds (for else 
Spoiler wins within the claimed bound (24) for the number of moves). Assume for 



47 



a while that |A| = \D[\ for all i < p. Consider an arbitrary bijection : V{G) — > 
V(G') that extends 0* and maps each A onto D\. The condition (*) immediately 
implies that <fi is an isomorphism from G to G . Since G and G' are supposed non- 
isomorphic, there must exist a A such that |A| ^ \D[\. We will call such a A 
useful. 

Similarly to Claim 3.13.5, we obtain, as a corollary from Claim 9.7.4, the follow- 
ing threat for Duplicator. 

Claim 9. 7. 6. If A is useful, then Spoiler is able to win having made in Phase 2 at 
most min{|A|, |-D-|} + k moves and at most 1 alternation between G and G'. □ 
Suppose now that there are two useful classes, A and Dj. Observe that 

| A I + \Dj\ = \Z\-Y, |A| <(n-s-q)- (r{X) - q - 2) = n - s - r(X) + 2. 

It follows that one of the useful classes has at most (n — s — t(X) + 2)/2 vertices. 
Thus, Spoiler is able to win totally in at most 

a+ n-.-r(X) + 2 +k= n + ,-r(X)+2 + k (2?) 
rounds. From (25) and (26), we infer that 

The bound (27) therefore does not exceed 

which is within the required bound (24). 

Finally, suppose that there is a unique useful class D m . According to Claim 9.7.6, 
Spoiler is able to win in at most \D m \ + k moves, with the total number of moves 
s + \ D m \ + k within the required bound (24) provided \D m \ <k — l. Thus, we arrive 
at the conclusion that the bound (24) may not hold true in the only case that there 
is exactly one useful class D m and \D m \ > k. Let ipo : V(G) \ D m — > V(G') \ D' m be 
an extension of <fi* that maps each D iy i ^ m, onto D\. Let ip : V(G) — > V(G') be 
an arbitrary injective extension of ipo (mapping D m into D' m ). By the condition (*) 
in Claim 9.7.5, ip is a partial isomorphism from G to G '. Take an arbitrary v G D m . 
By Claim 9.7.1, we have D m C [v]q. On the account of Item 2 of Lemma 9.6 we 
conclude that G' = G © {n' — n)v, completing the proof of the lemma. ■ 



9.3 Defining /c-graphs 

Theorem 4.6 carries over /c-graphs in a weaker form. The analogs of Lemmas 4.1 
and 4.2 that we have for /c-graphs (see Remark 4.9) together with Lemma 9.7, give 
the following analog of Lemma 4.3: 
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Let G and G' be non-isomorphic k-graphs. Let n denote the order of G. Then 
we have 

Di(G,G") < (l-^jn + 2k-l 

unless 

a(G) > (l-^jn + k-1. (28) 

In the latter case we have 

<?{G) + 1 < Di(G, G') < <j(G) + k. (29) 

Unfortunately, we cannot efficiently find the precise value of Di{G, G') in the range 
(29) for G with (28), as it was done for 2-graphs in Lemma 4.4. Anyway, we have 
the following result, which is rather reasonable as cr(G) is efficiently computable. 

Theorem 9.8 If a(G) < (l - |) n + k - 1, then 

Di(G,G") < (l-^jn + 2k-l. 

Otherwise, 

Di(G) < a(G) + k 
and this bound is at most k — 1 apart from the precise value of~D(G). 

10 Open questions 

1. In Section 1 we define 

D(n) = max{D(G, G') : G ? G' , \G\ = \G'\ = n} . 

By Example 2.13 and Theorem 1.1, 

n+1- n+3 
— < D(n) < — . 

This determines D(n) if n is even and leaves two possibilites for D(n) if n is odd. 
Which is the right value? 

2. Given a graph G, is the number D(G) computable (T. Luczak [22])? Can 
one, at least, improve the computable upper bound of Theorem 1.2, that is, can one 
lower the bound in Theorem 1.2 below n/2, of course, extending the class C? 

3. Theorem 5.1 is an improvement in the alternation number over Theorem 1.1. 
Can one as well improve on Theorem 1.2? 

4. Prove analogs of Theorem 5.1 for digraphs and /c-graphs. 

5. Improve the constant Cd in Theorem 6.2. Find connected bounded degree 
graphs G and G' of order n with linear lower bound D(G, G') = Q(n) better than 
that given by Cai, Fiirer, and Immerman [8] (cf. Proposition 7.16). 



49 



6. For the optimum dimension of the Weisfeiler-Lehman algorithm we know 
bounds 0.00465 n < WL(n) < 0.5 n + 0.5, where the lower bound is due to Cai, 
Fiirer, and Immerman [8] and the upper bound is shown in Section 7. Make the gap 
between these bounds closer. 

7. Generalize Theorem 8.2 over structures with maximum relation arity k proving 
a tight upper bound D 1 (G, G') < c k n(l + o(l)) for such structures G and G' of order 
n. We currently know [25] that l/2<c fc <l — I /(2k). In the particular case of 
/c-graphs, Theorem 9.1 gives us a better upper bound with c*, = 1 — 1/k. How tight 
is it? 

8. Find an analog of Lemma 4.4 for /c-graphs. Namely, let G and G' be /c-graphs 
of orders n < n', a(G) > n/2, and G' = G © (n' — n)v for some v £ V(G) with 
&g(v) = cr(G). By an analog of Lemma 4.2 for /c-graphs (see Remark 4.9), we have 
(7(C) + 1 < D(G, G') < Di(G, G') < a(G) + k. How to efficiently compute the exact 
value of D(G, G')l 

9. Let G be a graph. Let L'(G) denote the minimum / such that over the 
variable set {x±, . . . ,xi} there is a first order formula defining G. Is it true that 
L'(G) = max{L(G, G') : G' ^ G}? In other terms, are the definabilities in the 
/-variable first order logic and in the /-variable infinitary logic equivalent? 

The affirmative answer would follow from this assumption: There is a function / 
such that, if graphs G and G' are distinguished by a formula with / variables, then 
they are distinguished by a formula with / variables of quantifier rank at most f(n, I), 
where n is the order of G (no dependence on the order of G'\). In combinatorial 
terms: If Spoiler can win the game on G and G' with / pebbles in arbitrary number 
of rounds (reusing pebbles is allowed), then he can win with / pebbles in f(n,l) 
rounds, irrespective of the order of G' . Is it true? 
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